解決:讀取數據從url使用火花-磚- 28006

AryaMa · ‎07-12-2019

讀取數據形式url使用火花,community edition,路徑相關的錯誤,有什麼建議嗎?

url = " https://raw.githubusercontent.com/thomaspernet/data_csv_r/master/data/adult.csv "從pyspark進口SparkFiles spark.sparkContext.addFile (url) # sc.addFile (url) # sqlContext = sqlContext (sc) # df = sqlContext.read.csv (SparkFiles.get (“adult.csv”),頭= True, inferSchema = True) df = spark.read.csv (SparkFiles.get (“adult.csv”),頭= True, inferSchema = True)

錯誤:

路徑不存在:dbfs: / local_disk0 /火花- 9 - f23ed57 - 133 - e - 41 - d5 - 91 b2 - 12555 d641961 / userfiles d252b3ba - 499 c - 42 c9 - be48 - 96358357 - fb75 / adult.csv

RantoB · ‎11-02-2021

我得到了一個答案:

與pyspark讀csv直接從url (www.eheci.com)

謝謝

在原帖子查看解決方案

DonatienTessier · ‎07-16-2019

嗨@rr_5454,

你將會找到答案https://forums.www.eheci.com/questions/10648/upload -當地-文件- dbfs - 1. - html

你需要:

獲取本地文件存儲的文件
從dbfs移動文件
dataframe加載文件

這是其中的一個可能的解決方案。

THIAM_HUATTAN · ‎08-08-2019

我麵臨同樣的問題,你能提供一些代碼援助?謝謝

dazfuller · ‎09-28-2021

代碼對任何人都麵臨著同樣的問題,沒有移動到一個不同的路徑

導入請求與requests.get CHUNK_SIZE = 4096 (“https://raw.githubusercontent.com/suy1968/Adult.csv-Dataset/main/adult.csv”,流= True)職責:如果職責。ok:張開(" / dbfs / FileStore /數據/成人。csv”、“白平衡”)f:在resp.iter_content塊(chunk_size = chunk_size): f.write(塊)顯示(spark.read.csv (“dbfs: / FileStore /數據/成人。csv”,標題= True, inferSchema = True))

我不得不使用一個不同的URL的最初的問題是不再可用

lemfo · 周三

你好,
我這裏有非常準確的代碼,但它仍然不工作,說“沒有這樣的文件或目錄”
這是一個community edition的限製嗎?

導入請求CHUNK_SIZE = 4096 def get_remote_file (dataSrcUrl destFile):“簡單的舊茱莉安python函數遠程url加載到本地hdfs“destFile = " / dbfs " + destFile #與請求。得到(dataSrcUrl流= True)職責:如果職責。ok:張開(destFile,“世界銀行”)f:在resp.iter_content塊(chunk_size = chunk_size): f.write(塊)

get_remote_file (" https://gitlab.com/opstar/share20/-/raw/master/university.json ", " / Filestore /數據/ lgdt /大學。json”)

目錄“dbfs: / Filestore /數據/ lgdt”確實存在,我可以看到它在運行dbutils.fs.ls(路徑)命令