pyspark.pandas.DataFrame.spark.frame¶

火花。 框架 ( index_col:聯盟(str,列表(str),沒有一個)=沒有一個 )→pyspark.sql.dataframe.DataFrame¶

返回當前DataFrame DataFrame火花。DataFrame.spark.frame ()是一個別名DataFrame.to_spark ()。

參數

index_col: str或str列表,可選的,默認值:沒有: 列名稱用於引發代表pandas-on-Spark指數。的索引名稱pandas-on-Spark被忽略。默認情況下,索引總是丟失。

另請參閱

DataFrame.to_spark
DataFrame.pandas_api
DataFrame.spark.frame

例子

默認情況下,這個方法失去了指數如下。

           > > >df=ps。DataFrame({“一個”:(1,2,3),“b”:(4,5,6),“c”:(7,8,9]})> > >df。to_spark()。顯示()+ - - - + - - - + - - - +| | b c | |+ - - - + - - - + - - - +| | 1 | 4 | 7| 2 | 5 | 8 || | 3 | 6 | 9+ - - - + - - - + - - - +
          

           > > >df=ps。DataFrame({“一個”:(1,2,3),“b”:(4,5,6),“c”:(7,8,9]})> > >df。火花。框架()。顯示()+ - - - + - - - + - - - +| | b c | |+ - - - + - - - + - - - +| | 1 | 4 | 7| 2 | 5 | 8 || | 3 | 6 | 9+ - - - + - - - + - - - +
          

如果index_col是集,它使指定的索引列。

           > > >df。to_spark(index_col=“指數”)。顯示()+ - - - + - - - + - - - + - - - +c | | b指數| | |+ - - - + - - - + - - - + - - - +| 0 | 1 | 4 | 7 || 1 | 2 | 5 | 8 || | 2 | 3 | 6 | 9+ - - - + - - - + - - - + - - - +
          

保持索引列是有用的,當你想叫一些火花api並將其轉換回pandas-on-Spark DataFrame沒有創建一個默認的索引,從而影響性能。

           > > >spark_df=df。to_spark(index_col=“指數”)> > >spark_df=spark_df。過濾器(“= = 2”)> > >spark_df。pandas_api(index_col=“指數”)a b c指數1 2 5 8
          

在多索引的情況下,指定一個列表index_col。

           > > >new_df=df。set_index(“一個”,附加=真正的)> > >new_spark_df=new_df。to_spark(index_col=(“index_1”,“index_2”])> > >new_spark_df。顯示()+ - - - - - - - - - - - - - - + - - - + - - - +| index_1 | index_2 b c | | |+ - - - - - - - - - - - - - - + - - - + - - - +| 0 | 1 | 4 | 7 || 1 | 2 | 5 | 8 || | 2 | 3 | 6 | 9+ - - - - - - - - - - - - - - + - - - + - - - +
          

同樣,可以轉化成回到pandas-on-Spark DataFrame。

           > > >new_spark_df。pandas_api(…index_col=(“index_1”,“index_2”])b cindex_1 index_20 1 4 71 2 5 82 3 6 9
          

以前的

pyspark.pandas.DataFrame.style

下一個

pyspark.pandas.DataFrame.spark.cache