val df = Seq ((“2021-11-05 02:46:47.154410”) (“2019-10-05 2:46:47.154410”)) .toDF (“old_column”)
顯示器(df)
進口org.apache.spark.sql.functions._
val df2 = df。withColumn (“new_column from_unixtime (unix_timestamp(坳(“old_column”)、“yyyy-MM-dd HH: mm: ss.SSSSSS”),“yyyy-MM-dd HH: mm: ss”))
顯示器(df2)
我已經測試了這個和這個應該工作
從pyspark進口SparkContext
從pyspark。sql進口SQLContext
從functools進口減少
進口pyspark.sql。函數作為F
sc = SparkContext.getOrCreate ()
sql = SQLContext (sc)
input_list = [
(1、“2019-11-07 10:30:00”),(1,“2019-11-08 10:30:00”)
(1,“2019-11-09 10:30:00”)
(1,“2019-11-11 10:30:00”)
(1,“2019-11-12 10:30:00”)
(1,“2019-11-13 10:30:00”)
(1,“2019-11-14 10:30:00”)
“2019-11-08 10:30:00”,(2)
“2019-11-09 10:30:00”,(2)
“2019-11-09 10:30:00”,(3)
“2019-11-10 10:30:00”,(3)
“2019-11-11 10:30:00”,(3)
“2019-11-15 10:30:00”,(2)
“2019-11-18 10:30:00”,(2)
(4,“2019-11-10 10:30:00”)
(4,“2019-11-11 10:30:00”)
sparkDF = sql.createDataFrame (input_list (“customerid”,“日期”))
sparkDF = sparkDF.withColumn (date_timestamp, F.to_timestamp (F.col(“日期”),yyyy-MM-dd HH: mm: ss)) sparkDF.show ()