更新三角洲湖表模式<一個class="headerlink" href="//www.eheci.com/docs.gcp/delta/#update-delta-lake-table-schema" title="">

三角洲湖允許您更新一個表的模式。支持以下類型的變化:

添加新列(在任意位置)
重新安排現有的列
重命名現有列

你可以讓這些變化顯式或隱式地使用DML使用DDL。

重要的

當你更新一個δ表模式,流讀取該表的終止。如果你想繼續流必須重新啟動它。

推薦的方法,請參閱<一個class="reference internal" href="//www.eheci.com/docs.gcp/docs.gcp/structured-streaming/production.html">生產注意事項結構化流。

顯式地添加列更新模式<一個class="headerlink" href="//www.eheci.com/docs.gcp/delta/#explicitly-update-schema-to-add-columns" title="">

             改變表table_name添加列(col_namedata_type(評論col_comment](第一個|後colA_name),…)
            

默認情況下,nullability真正的。

添加一個列到嵌套,使用:

             改變表table_name添加列(col_name。nested_col_namedata_type(評論col_comment](第一個|後colA_name),…)
            

例如,如果之前的模式運行改變表盒子添加列(colB.nested字符串後field1)是:

             - - - - - -根|- - - - - -可樂|- - - - - -colB|+ -field1|+ -field2
            

後的模式是:

             - - - - - -根|- - - - - -可樂|- - - - - -colB|+ -field1|+ -嵌套的|+ -field2
            

請注意

添加嵌套列支持結構。不支持數組和地圖。

顯式地更新模式改變列的評論或訂購<一個class="headerlink" href="//www.eheci.com/docs.gcp/delta/#explicitly-update-schema-to-change-column-comment-or-ordering" title="">

             改變表table_name改變(列]col_name(評論col_comment|第一個|後colA_name)
            

改變一個列在一個嵌套的領域,使用:

             改變表table_name改變(列]col_name。nested_col_name(評論col_comment|第一個|後colA_name)
            

例如,如果之前的模式運行改變表盒子改變列colB.field2第一個是:

             - - - - - -根|- - - - - -可樂|- - - - - -colB|+ -field1|+ -field2
            

後的模式是:

             - - - - - -根|- - - - - -可樂|- - - - - -colB|+ -field2|+ -field1
            

明確更新模式來取代列<一個class="headerlink" href="//www.eheci.com/docs.gcp/delta/#explicitly-update-schema-to-replace-columns" title="">

             改變表table_name取代列(col_name1col_type1(評論col_comment1),…)
            

例如,當運行以下DDL:

             改變表盒子取代列(colC字符串,colB結構體<field2:字符串,嵌套的:字符串,field1:字符串>,可樂字符串)
            

如果之前的模式是:

             - - - - - -根|- - - - - -可樂|- - - - - -colB|+ -field1|+ -field2
            

後的模式是:

             - - - - - -根|- - - - - -colC|- - - - - -colB|+ -field2|+ -嵌套的|+ -field1|- - - - - -可樂
            

重命名列明確的更新模式<一個class="headerlink" href="//www.eheci.com/docs.gcp/delta/#explicitly-update-schema-to-rename-columns" title="">

預覽

這個特性是在<一個class="reference internal" href="//www.eheci.com/docs.gcp/docs.gcp/release-notes/release-types.html">公共預覽。

請注意

這個特性可以在磚運行時10.2及以上。

重命名列不重寫任何列的現有數據,您必須啟用列映射表。看到<一個class="reference internal" href="//www.eheci.com/docs.gcp/docs.gcp/delta/delta-column-mapping.html">重命名和刪除列與三角洲湖列映射。

重命名一個列:

             改變表table_name重命名列old_col_name來new_col_name
            

重命名一個嵌套的字段:

             改變表table_name重命名列col_name。old_nested_field來new_nested_field
            

例如,當您運行以下命令:

             改變表盒子重命名列colB。field1來field001
            

如果之前的模式是:

             - - - - - -根|- - - - - -可樂|- - - - - -colB|+ -field1|+ -field2
            

之後的模式是:

             - - - - - -根|- - - - - -可樂|- - - - - -colB|+ -field001|+ -field2
            

看到<一個class="reference internal" href="//www.eheci.com/docs.gcp/docs.gcp/delta/delta-column-mapping.html">重命名和刪除列與三角洲湖列映射。

明確更新模式刪除列<一個class="headerlink" href="//www.eheci.com/docs.gcp/delta/#explicitly-update-schema-to-drop-columns" title="">

預覽

這個特性是在<一個class="reference internal" href="//www.eheci.com/docs.gcp/docs.gcp/release-notes/release-types.html">公共預覽。

請注意

這個特性可以在磚運行時11.0及以上。

刪除列僅元數據操作,而無需重新編寫任何數據文件,您必須啟用列映射表。看到<一個class="reference internal" href="//www.eheci.com/docs.gcp/docs.gcp/delta/delta-column-mapping.html">重命名和刪除列與三角洲湖列映射。

重要的

刪除一列從元數據不會刪除列的底層數據文件。清除掉列數據,您可以使用<一個class="reference internal" href="//www.eheci.com/docs.gcp/docs.gcp/sql/language-manual/delta-reorg-table.html">REORG表修改文件。然後,您可以使用<一個class="reference internal" href="//www.eheci.com/docs.gcp/docs.gcp/sql/language-manual/delta-vacuum.html">真空物理上刪除的文件包含了列數據。

放棄一個列:

             改變表table_name下降列col_name
            

將多個列:

             改變表table_name下降列(col_name_1,col_name_2)
            

明確更新模式改變列類型或名稱<一個class="headerlink" href="//www.eheci.com/docs.gcp/delta/#explicitly-update-schema-to-change-column-type-or-name" title="">

你可以改變一個列的類型或名稱或刪除表通過重寫一列。要做到這一點,使用overwriteSchema選擇。

下麵的例子顯示了更改列類型:

             (火花。讀。表(…)。withColumn(“生日”,上校(“生日”)。投(“日期”))。寫。模式(“覆蓋”)。選項(“overwriteSchema”,“真正的”)。saveAsTable(…))
            

下麵的例子展示了更改列的名字:

             (火花。讀。表(…)。withColumnRenamed(“dateOfBirth”,“生日”)。寫。模式(“覆蓋”)。選項(“overwriteSchema”,“真正的”)。saveAsTable(…))
            

添加列自動模式更新<一個class="headerlink" href="//www.eheci.com/docs.gcp/delta/#add-columns-with-automatic-schema-update" title="">

列中DataFrame但失蹤從表中自動添加時寫事務的一部分:

寫或writeStream有.option (“mergeSchema”,“真正的”)
spark.databricks.delta.schema.autoMerge.enabled是真正的

當指定兩個選項,選擇從DataFrameWriter優先。添加的列是附加到他們存在的結構。當添加一個新的列保存。

請注意

mergeSchema不能用於插入成或.write.insertInto ()。

自動模式演化為三角洲湖合並<一個class="headerlink" href="//www.eheci.com/docs.gcp/delta/#automatic-schema-evolution-for-delta-lake-merge" title="">

模式演化允許用戶解決模式不匹配的目標和源表合並。它處理以下兩種情況:

源表中的一列目標表中不存在。新列添加到目標模式,和它的值插入或更新使用源值。
目標表中的一列源表中不存在。目標模式是不變的;額外的目標列中的值不變(更新)或一組零(插入)。

重要的

使用模式演化,你必須設置會話配置'spark.databricks.delta.schema.autoMerge.enabled火花”真正的在你運行合並命令。

請注意

在磚LTS 7.3運行時,合並隻支持模式演化的頂級列,而不是嵌套的列。
磚運行時的12.2及以上,可以指定列出現在源表的名字在插入或更新操作。在磚12.1運行時,下麵的,隻有插入*或更新集*行動可以用於模式演化與合並。

這裏有一些例子的影響合並操作,沒有模式演化。

列	在SQL查詢()	行為沒有模式演化(默認)	行為模式演化
目標列:`鍵,價值` 源列:`鍵,值,new_value`	合並成target_tablet使用source_table年代在t。關鍵=年代。關鍵當匹配然後更新集當不匹配然後插入	表模式保持不變;隻列`關鍵`,`價值`更新/插入。	表模式改變`(關鍵值,new_value)`。與匹配更新現有的記錄`價值`和`new_value`在源。新行插入模式`(關鍵值,new_value)`。
目標列:`鍵,old_value` 源列:`鍵,new_value`	合並成target_tablet使用source_table年代在t。關鍵=年代。關鍵當匹配然後更新集當不匹配然後插入	`更新`和`插入`因為目標列行動拋出一個錯誤`old_value`不是在源。	表模式改變`(關鍵old_value,new_value)`。與匹配更新現有的記錄`new_value`在源離開`old_value`不變。新記錄插入指定的`關鍵`,`new_value`,`零`為`old_value`。
目標列:`鍵,old_value` 源列:`鍵,new_value`	合並成target_tablet使用source_table年代在t。關鍵=年代。關鍵當匹配然後更新集new_value=年代。new_value	`更新`拋出一個錯誤,因為列`new_value`目標表中不存在。	表模式改變`(關鍵old_value,new_value)`。與匹配更新現有的記錄`new_value`在源離開`old_value`不變,和無與倫比的記錄`零`參加了`new_value`。看到的請注意<一個class="reference internal" href="//www.eheci.com/docs.gcp/delta/#1">(1)。
目標列:`鍵,old_value` 源列:`鍵,new_value`	合並成target_tablet使用source_table年代在t。關鍵=年代。關鍵當不匹配然後插入(關鍵,new_value)值(年代。關鍵,年代。new_value)	`插入`拋出一個錯誤,因為列`new_value`目標表中不存在。	表模式改變`(關鍵old_value,new_value)`。新記錄插入指定的`關鍵`,`new_value`,`零`為`old_value`。現有的記錄`零`參加了`new_value`離開`old_value`不變。看到的請注意<一個class="reference internal" href="//www.eheci.com/docs.gcp/delta/#1">(1)。

(1)這種行為可以在磚運行時12.2及以上;在這種情況下磚12.1運行時,下麵的錯誤。

自動模式演化結構體的數組<一個class="headerlink" href="//www.eheci.com/docs.gcp/delta/#automatic-schema-evolution-for-arrays-of-structs" title="">

δ合並成支持解決結構體字段為結構體數組的名字和發展模式。啟用模式演化後,目標表模式將為陣列結構的進化,也適用於任何嵌套的結構體數組的內部。

請注意

這個特性可以在磚運行時9.1及以上。磚運行時的9.0及以下,隱式火花鑄造用於數組結構來解決結構體字段的位置,和合並操作的影響,沒有模式演化的結構體數組與以外的結構體數組的行為不一致。
在磚運行時的12.2及以上,可以指定結構體字段出現在源表的名字在insert或update命令。在磚12.1運行時,下麵的,隻有插入*或更新集*命令可用於模式演化與合並。

這裏有一些例子的合並操作的影響,沒有模式演化的結構體數組。

源模式	目標模式	行為沒有模式演化(默認)	行為模式演化
數組< struct < b:字符串,答:string > >	<結構體數組< int, b: int > >	表模式保持不變。列將解決名稱和更新或插入。	表模式保持不變。列將解決名稱和更新或插入。
數組< struct < int, c:字符串,d: string > >	<結構體數組<字符串,b: string > >	`更新`和`插入`把錯誤是因為`c`和`d`目標表中不存在。	表模式更改為數組< struct <字符串,b:字符串,c:字符串,d: string > >。`c`和`d`是插入`零`現有條目的目標表。`更新`和`插入`源表中的條目填充`一個`字符串和轉化`b`作為`零`。
數組< struct <字符串,b: struct < c:字符串,d: string > > >	數組< struct <字符串,b: struct < c: string > > >	`更新`和`插入`把錯誤是因為`d`目標表中不存在。	目標表模式更改為數組< struct <字符串,b: struct < c:字符串,d: string > > >。`d`是插入`零`現有條目的目標表。

處理`NullType`列模式更新<一個class="headerlink" href="//www.eheci.com/docs.gcp/delta/#dealing-with-nulltype-columns-in-schema-updates" title="">

因為拚花不支持NullType,NullType列從DataFrame當編寫成三角洲表下降,但仍存儲在模式。當接收到一個不同的數據類型列,三角洲湖合並到新數據類型的模式。如果δ收到一個湖NullType對於一個已有的列,保留舊模式和新列是在寫了。

NullType在不支持流媒體。因為使用流媒體時必須設置模式這應該是非常罕見的。NullType也不接受等複雜類型ArrayType和MapType。

替換表模式<一個class="headerlink" href="//www.eheci.com/docs.gcp/delta/#replace-table-schema" title="">

默認情況下,覆蓋一個表中的數據不會覆蓋模式。當覆蓋表使用模式(“覆蓋”)沒有replaceWhere,您可能還想覆蓋寫入數據的模式。你替換的模式和分區表通過設置overwriteSchema選項真正的:

             df。寫。選項(“overwriteSchema”,“真正的”)
            

更新三角洲湖表模式<一個class="headerlink" href="//www.eheci.com/docs.gcp/delta/#update-delta-lake-table-schema" title="">

顯式地添加列更新模式<一個class="headerlink" href="//www.eheci.com/docs.gcp/delta/#explicitly-update-schema-to-add-columns" title="">

顯式地更新模式改變列的評論或訂購<一個class="headerlink" href="//www.eheci.com/docs.gcp/delta/#explicitly-update-schema-to-change-column-comment-or-ordering" title="">

明確更新模式來取代列<一個class="headerlink" href="//www.eheci.com/docs.gcp/delta/#explicitly-update-schema-to-replace-columns" title="">

重命名列明確的更新模式<一個class="headerlink" href="//www.eheci.com/docs.gcp/delta/#explicitly-update-schema-to-rename-columns" title="">

明確更新模式刪除列<一個class="headerlink" href="//www.eheci.com/docs.gcp/delta/#explicitly-update-schema-to-drop-columns" title="">

明確更新模式改變列類型或名稱<一個class="headerlink" href="//www.eheci.com/docs.gcp/delta/#explicitly-update-schema-to-change-column-type-or-name" title="">

添加列自動模式更新<一個class="headerlink" href="//www.eheci.com/docs.gcp/delta/#add-columns-with-automatic-schema-update" title="">

自動模式演化為三角洲湖合並<一個class="headerlink" href="//www.eheci.com/docs.gcp/delta/#automatic-schema-evolution-for-delta-lake-merge" title="">

自動模式演化結構體的數組<一個class="headerlink" href="//www.eheci.com/docs.gcp/delta/#automatic-schema-evolution-for-arrays-of-structs" title="">

處理NullType列模式更新<一個class="headerlink" href="//www.eheci.com/docs.gcp/delta/#dealing-with-nulltype-columns-in-schema-updates" title="">

替換表模式<一個class="headerlink" href="//www.eheci.com/docs.gcp/delta/#replace-table-schema" title="">

處理`NullType`列模式更新<一個class="headerlink" href="//www.eheci.com/docs.gcp/delta/#dealing-with-nulltype-columns-in-schema-updates" title="">