問題與閱讀導出表存儲在parqu……-磚- 7746

shiva12494 · ‎03-14-2023

你好,我從postgres導出所有表鑲花格式的快照到S3。我想讀表使用磚和我無法這樣做。我得到以下錯誤:“無法推斷模式為拚花。它必須手動指定。”I tried specifying the schema it still wont work. I dint need to specify schema to read parquet files before this so wondering whats different with this, i also tried to copy the parquet file to local and got an error relating to ciphertext.I have attached the error and file name screenshots.Any help is appreciated.

匿名 · ‎03-24-2023

@shiva查蘭velichala:

可能你出口的拚花文件postgres快照被加密或壓縮。如果是這樣的話,你需要解密和解壓文件之前,你可以閱讀磚。

另外,如果模式沒有被正確推斷,您可以指定閱讀的模式使用手動模式參數函數在磚。例如:

從pyspark.sql。類型進口StructType、StructField StringType, IntegerType my_schema = StructType ([StructField (“column1 StringType(),真的),StructField (“column2 IntegerType(),真的),…])df = spark.read.schema (my_schema) .parquet(路徑“/ / /拚花/文件”)

取代column1 column2等與實際列名在你的模式。

如果你仍然有問題,你可能想嚐試在另一個程序打開鋪文件(例如Apache箭頭),看看你能夠訪問它們。

Vidula_Khanna · ‎03-25-2023

嗨@shiva查蘭velichala

謝謝你發布你的問題在我們的社區!我們很高興幫助你。

幫助我們為您提供最準確的信息,請您花一些時間來回顧反應和選擇一個最好的回答了你的問題嗎?

這也將有助於其他社區成員可能也有類似的問題在未來。謝謝你的參與,讓我們知道如果你需要任何進一步的援助!

磚

問題與閱讀導出表存儲在拚花