當我試圖從sql服務器讀取數據通過jdbc連接,我得到以下錯誤,合並到磚表的數據。你能請幫助什麼相關的問題嗎?
:org.apache.spark。火花Exception: Job aborted due to stage failure: Task 1 in stage 188.0 failed 4 times, most recent failure: Lost task 1.3 in stage 188.0 (TID 1823) (10###.#。#執行人9):ExecutorLostFailure(執行人9退出正在運行的任務之一所致)原因:命令退出代碼50司機加:在org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages (DAGScheduler.scala: 3376) org.apache.spark.scheduler.DAGScheduler。anonfun abortStage美元2美元(DAGScheduler.scala: 3308) org.apache.spark.scheduler.DAGScheduler。anonfun abortStage美元$ 2 $改編(DAGScheduler.scala: 3299) scala.collection.mutable.ResizableArray.foreach (ResizableArray.scala: 62) scala.collection.mutable.ResizableArray.foreach美元(ResizableArray.scala: 55) scala.collection.mutable.ArrayBuffer.foreach (ArrayBuffer.scala: 49) org.apache.spark.scheduler.DAGScheduler.abortStage (DAGScheduler.scala: 3299) org.apache.spark.scheduler.DAGScheduler。anonfun handleTaskSetFailed美元1美元(DAGScheduler.scala: 1428) org.apache.spark.scheduler.DAGScheduler。anonfun handleTaskSetFailed美元$ 1 $改編(DAGScheduler.scala: 1428) scala.Option.foreach (Option.scala: 407) org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed (DAGScheduler.scala: 1428) org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive (DAGScheduler.scala: 3588) org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive (DAGScheduler.scala: 3526) org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive (DAGScheduler.scala: 3514) org.apache.spark.util.EventLoop不久美元1.美元運行(EventLoop.scala: 51)
這個錯誤出現當我們試圖讀取數據從SQL server使用單個連接。我建議使用numPartitions,下界和upperBound配置並行化數據讀取。
你可以在這裏找到一份詳細的文檔https://docs.www.eheci.com/en/external-data/jdbc.html: ~:文本=保存()% 0 a),控製% 20並行% 20 f……
嗨@Tharun-Kumar。我已經使用numPartitions,下界和upperBound配置並行化數據讀取。我還是看到同樣的錯誤。
df = spark.read。選項(“numPartitions”, 32)。選項(“fetchSize”、“1000”)。選項(“partitionColumn”、“關鍵”)。選項(“下界”,min_o)。選項(“upperBound max_o) .jdbc (url = jdbcUrl表= f“t”({query_attr}),屬性= connectionProperties)