問題
您有一個Apache Spark作業,由於Java斷言錯誤而失敗assert failed:檢測到衝突的目錄結構。
示例堆棧跟蹤
導致:在嚐試推斷當前批文件的分區模式時出錯。請使用:.option('cloudFiles. exe ')顯式提供您的分區列。partitionColumns’,'comma-separated-list') === Streaming Query === Identifier: [id = aabc5549-cb4b-4e4e-9403-4e793f4824a0, runId = 4e743dda-909f-4932-9489-3dd0b364d811] Current Committed Offsets: {} Current Available Offsets: {CloudFilesSource[://domain.com/km/gold/cfy_gold/clfy_x_clfy_evt]: {'seqNum':423,'sourceVersion':1}} Current State: ACTIVE Thread State: RUNNABLE Logical Plan: CloudFilesSource[ ://domain.com/km/gold/cfy_gold/clfy_x_clfy_evt] at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:385) at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:268) Caused by: java.lang.RuntimeException: There was an error when trying to infer the partition schema of the current batch of files. Please provide your partition columns explicitly by using: .option('cloudFiles.partitionColumns', 'comma-separated-list') at com.databricks.sql.fileNotification.autoIngest.CloudFilesErrors$.partitionInferenceError(CloudFilesErrors.scala:115) at com.databricks.sql.fileNotification.autoIngest.CloudFilesSourceFileIndex.liftedTree1$1(CloudFilesSourceFileIndex.scala:65) at com.databricks.sql.fileNotification.autoIngest.CloudFilesSourceFileIndex.partitionSpec(CloudFilesSourceFileIndex.scala:63) at org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.partitionSchema(PartitioningAwareFileIndex.scala:50) at com.databricks.sql.fileNotification.autoIngest.CloudFilesSource.getBatch(CloudFilesSource.scala:361) ... 1 more Caused by: java.lang.AssertionError: assertion failed: Conflicting directory structures detected. Suspicious paths: ://domain.com/km/gold/cfy_gold/clfy_x_clfy_evt ://domain.com/km/gold/cfy_gold/clfy_x_clfy_evt/clfy_x_clfy_evt If provided paths are partition directories, please set 'basePath' in the options of the data source to specify the root directory of the table. If there are multiple root directories, please load them separately and then union them. at scala.Predef$.assert(Predef.scala:223) at org.apache.spark.sql.execution.datasources.PartitioningUtils$.parsePartitions(PartitioningUtils.scala:204) at org.apache.spark.sql.execution.datasources.PartitioningUtils$.parseP
導致
存儲位置中存在衝突的目錄路徑。
在示例堆棧跟蹤中,我們看到兩個衝突的目錄路徑。
- <文件係統>:/ / domain.com/km/gold/cfy_gold/clfy_x_clfy_evt
- <文件係統>:/ / domain.com/km/gold/cfy_gold/clfy_x_clfy_evt/clfy_x_clfy_evt
由於這些目錄出現在相同的層次結構中,根目錄或分支級別中的更新可能導致衝突。
解決方案
避免在分層目錄結構中進行多個並發更新,或者避免在同一個分區中進行更新。
一旦檢測到衝突,您應該為更新設置多個不同的路徑。或者,您也可以添加更多分區。
這些示例目錄並不衝突。
- <文件係統>:/ / domain.com/km/gold/cfy_gold/clfy_x_clfy_evt/evt = clfy_x_clfy_evt1
- <文件係統>:/ / domain.com/km/gold/cfy_gold/clfy_x_clfy_evt/evt = clfy_x_clfy_evt2