嗨,夥計們,
我運行一個生產管道(磚運行時7.3 LTS),失敗對於一些三角洲文件讀取錯誤:
21/07/19 09:56:02錯誤執行人:除了在36.1任務階段2.0 com.databricks.sql.io (TID 58)。FileReadException:讀取文件時錯誤dbfs: /δ/ dbname /表/部分4670 - 4522 - 00002 - 6 - df5def6 - - bed9 bcef79a172bc c000.snappy.parquet。在org.apache.spark.sql.execution.datasources.FileScanRDD立刻1美元立刻2.美元logfilenameandthrow (FileScanRDD.scala: 347) org.apache.spark.sql.execution.datasources.FileScanRDD立刻1美元立刻2.美元getnext (FileScanRDD.scala: 326) org.apache.spark.util.NextIterator.hasNext (NextIterator.scala: 73) org.apache.spark.sql.execution.datasources.FileScanRDD立刻1.美元hasnext (FileScanRDD.scala: 258) org.apache.spark.sql.execution.FileSourceScanExec立刻1.美元hasnext (DataSourceScanExec.scala: 716) org.apache.spark.sql.catalyst.expressions.GeneratedClass GeneratedIteratorForCodegenStage1美元。columnartorow_nextBatch_0(未知源)在org.apache.spark.sql.catalyst.expressions.GeneratedClass GeneratedIteratorForCodegenStage1美元。processNext(未知源)org.apache.spark.sql.execution.BufferedRowIterator.hasNext (BufferedRowIterator.java: 43) org.apache.spark.sql.execution.WholeStageCodegenExec立刻1.美元hasnext (WholeStageCodegenExec.scala: 733) scala.collection.Iterator立刻10.美元hasnext (Iterator.scala: 458)美元org.apache.spark.util.Utils .getIteratorSize (Utils.scala: 2008) org.apache.spark.rdd.RDD anonfun數美元。1美元(RDD.scala: 1234) org.apache.spark.rdd.RDD。anonfun數美元$ 1 $改編(RDD.scala: 1234) org.apache.spark.SparkContext。anonfun runJob美元5美元(SparkContext.scala: 2379) org.apache.spark.scheduler.ResultTask.runTask (ResultTask.scala: 90) org.apache.spark.scheduler.Task.doRunTask (Task.scala: 144) org.apache.spark.scheduler.Task.run (Task.scala: 117) org.apache.spark.executor.Executor TaskRunner美元。anonfun運行9美元美元(Executor.scala: 640)美元org.apache.spark.util.Utils .tryWithSafeFinally (Utils.scala: 1581)美元org.apache.spark.executor.Executor TaskRunner.run (Executor.scala: 643) java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java: 1149)美元java.util.concurrent.ThreadPoolExecutor Worker.run (ThreadPoolExecutor.java: 624) java.lang.Thread.run (Thread.java: 748)造成的:javax.net.ssl.SSLException:連接重置;請求ID: BABR4P3PP4X21SWG,擴展請求ID: SxgYnGm6XJNalP0H2c339Kq4 / H7N2P8x09C / GxxMHnNwdGCnhyPlQv15SLRJ + eALsIEKRvvcbvg =,雲提供商:AWS,實例ID: i-0a9a161dac10f903a sun.security.ssl.Alert.createSSLException (Alert.java: 127) sun.security.ssl.TransportContext.fatal (TransportContext.java: 348) sun.security.ssl.TransportContext.fatal (TransportContext.java: 291) sun.security.ssl.TransportContext.fatal (TransportContext.java: 286)
這個錯誤是奇怪的,因為它不發生同樣的數據集,當我做同樣的火花。讀操作從一個筆記本。錯誤隻發生在它運行的工作。加亮顯示SparkException造成的SSLException這是引起的SocketException。
事實證明,上周,一個類似的問題可能被記錄在磚基礎——知識https://kb.www.eheci.com/dbfs/s3-connection-reset-error.html
因此我的問題是我怎麼刪除了火花配置指示的知識庫文章
spark.hadoop.fs.s3。impl com.databricks.s3a.S3AFileSystemspark.hadoop.fs.s3n.impl com.databricks.s3a.S3AFileSystem spark.hadoop.fs.s3a.impl com.databricks.s3a.S3AFileSystem
任何進一步的信息,文章將不勝感激。
問候,