您將使用的組合功能:
pyspark.sql.functions.from_unixtime(時間戳格式= yyyy-MM-dd HH: mm: ss)
(文檔)和
pyspark.sql.functions。unix_timestamp(時間戳= None,格式= yyyy-MM-dd HH: mm: ss)
(文檔)
從pyspark.sql。從pyspark.sql進口*類型。功能導入unix_timestamp from_unixtime df =火花。createDataFrame([“6/3/2019 5:06:00點”),StringType ()) .toDF (ts_string) #轉換為時間戳df1 = df型。選擇(from_unixtime (unix_timestamp (“ts_string”,“MM / dd / yyyy hh: MM: ss“)) .cast (TimestampType ()) .alias(“時間戳”))#修改時間戳格式
df2 = df1。選擇(from_unixtime (unix_timestamp(“時間戳”;hh: mm: ss)) .alias (“timestamp2”)) #一起df3 = df。選擇(ts_string, from_unixtime (unix_timestamp (“ts_string”,“MM / dd / yyyy hh: MM: ss“)) .cast (TimestampType ()) .alias(“時間戳”),from_unixtime (unix_timestamp (from_unixtime (unix_timestamp (“ts_string”,“MM / dd / yyyy hh: MM: ss“)) .cast (TimestampType ()), '; hh: MM: ss ')) .alias (“timestamp2”))