外部Apache蜂巢metastore(遺留)<一個class="headerlink" href="//www.eheci.com/docs.gcp/archive/external-metastores/#external-apache-hive-metastore-legacy" title="">

本文描述了如何設置磚Apache蜂巢metastores集群連接到現有的外部。metastore部署模式,它提供了信息推薦的網絡設置,和集群配置需求,其次是說明配置集群連接到外部metastore。蜂巢庫版本包含在磚運行時,看到相關數據磚運行時版本<一個class="reference internal" href="//www.eheci.com/docs.gcp/docs.gcp/release-notes/runtime/index.html">發布說明。

重要的

SQL服務器不工作作為底層metastore蜂巢2.0及以上的數據庫。
如果你使用Azure為MySQL數據庫外部metastore,你必須改變的價值lower_case_table_names房地產從1(默認)2服務器端的數據庫配置。有關詳細信息,請參見<一個class="reference external" href="https://dev.mysql.com/doc/refman/5.6/en/identifier-case-sensitivity.html">標識符區分大小寫。

請注意

使用外部metastores遺留數據治理模型。磚建議你升級到統一的目錄。統一目錄簡化了數據的安全性和治理提供一個中心位置管理和審計數據訪問跨多個工作空間在您的帳戶。看到<一個class="reference internal" href="//www.eheci.com/docs.gcp/docs.gcp/data-governance/unity-catalog/index.html">聯合目錄是什麼?。

蜂巢metastore部署模式<一個class="headerlink" href="//www.eheci.com/docs.gcp/archive/external-metastores/#hive-metastore-deployment-modes" title="">

在生產環境中,您可以在兩種模式:部署一個蜂巢metastore本地和遠程。

本地模式

metastore客戶機運行在集群底層metastore數據庫直接通過JDBC連接。

遠程模式

而不是直接連接到底層數據庫,metastore客戶機連接到一個單獨的metastore服務通過節儉協議。metastore服務連接到底層數據庫。在運行metastore在遠程模式下,DBFS不支持。

更多細節關於這些部署模式,請參閱<一個class="reference external" href="https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+Administration">蜂巢的文檔。

請注意

本文中的示例使用MySQL作為底層metastore數據庫。

網絡設置<一個class="headerlink" href="//www.eheci.com/docs.gcp/archive/external-metastores/#network-setup" title="">

磚集群運行在一個虛擬私有雲(VPC)。我們建議您設置外部蜂巢metastore內部一個新的VPC然後同行這兩個VPC集群連接到蜂巢metastore使用私有IP地址。_提供詳細說明如何同行使用的VPC磚集群和VPC metastore生活的地方。凝視VPC之後,您可以測試網絡連接從一個集群metastore VPC通過運行以下命令在一個筆記本:

% sh數控vz < DNS名稱或私人IP > <口>

在哪裏

< DNS的名字或私人IP >DNS名稱或MySQL數據庫的私有IP地址(本地模式)或metastore服務(用於遠程模式)。如果你使用一個DNS名稱,確保解決IP地址是私人的。
<口>MySQL數據庫的港口或港口metastore服務。

集群配置<一個class="headerlink" href="//www.eheci.com/docs.gcp/archive/external-metastores/#cluster-configurations" title="">

必須設置三套集群配置選項來連接到外部metastore:

火花選項配置火花的蜂巢metastore版本和jar metastore客戶機。
蜂巢的選項配置metastore客戶機連接到外部metastore。

火花配置選項<一個class="headerlink" href="//www.eheci.com/docs.gcp/archive/external-metastores/#spark-configuration-options" title="">

集spark.sql.hive.metastore.version版本的蜂巢metastore和spark.sql.hive.metastore.jars如下:

蜂巢0.13:不設置spark.sql.hive.metastore.jars。

請注意

蜂巢1.2.0和1.2.1不是內置metastore磚運行時7.0及以上。如果你想使用Hive 1.2.0或1.2.1磚運行時7.0及以上的,按照描述的過程<一個class="reference internal" href="//www.eheci.com/docs.gcp/archive/external-metastores/#download-the-metastore-jars-and-point-to-them">下載metastore罐子,指向他們。
蜂巢2.3.7(磚運行時7.0 - 9. x)或蜂巢2.3.9(磚運行時的10.0及以上):集spark.sql.hive.metastore.jars來內裝式。
對於所有其他蜂巢版本,磚建議您下載metastore罐子和設置配置spark.sql.hive.metastore.jars指使用中描述的程序下載的jar<一個class="reference internal" href="//www.eheci.com/docs.gcp/archive/external-metastores/#download-the-metastore-jars-and-point-to-them">下載metastore罐子,指向他們。

下載metastore罐子,指向他們<一個class="headerlink" href="//www.eheci.com/docs.gcp/archive/external-metastores/#download-the-metastore-jars-and-point-to-them" title="">

創建一個集群spark.sql.hive.metastore.jars設置為maven和spark.sql.hive.metastore.version你的metastore匹配版本。

當集群運行時,搜索驅動程序日誌,找到一條線如下:

                 17/11/1822:41:19信息IsolatedClientLoader:下載metastore罐子來<路徑>
                

的目錄<路徑>是司機的位置下載jar節點的集群。

或者您可以運行下麵的代碼在一個Scala筆記本印刷罐的位置:

                 進口com。類型安全。配置。ConfigFactory瓦爾路徑=ConfigFactory。負載()。getString(“java.io.tmpdir”)println(s \ nHive下載jar的路徑:美元路徑\ n”)
                

運行% shcp- r<路徑>/ dbfs hive_metastore_jar(替換<路徑>與您的集群的信息)將這個目錄複製到一個目錄在DBFS根hive_metastore_jar通過DBFS客戶機驅動程序節點。
創建一個<一個class="reference internal" href="//www.eheci.com/docs.gcp/docs.gcp/clusters/init-scripts.html">init腳本複製/ dbfs hive_metastore_jar節點的本地文件係統,確保使init腳本睡眠幾秒鍾之前訪問DBFS客戶機。這將確保客戶已經準備好了。
集spark.sql.hive.metastore.jars使用這個目錄。如果你的init腳本拷貝/ dbfs hive_metastore_jar來/磚/ hive_metastore_jars /,設置spark.sql.hive.metastore.jars來/磚/ hive_metastore_jars / *。位置必須包括拖曳/ *。
重新啟動集群。

蜂巢的配置選項<一個class="headerlink" href="//www.eheci.com/docs.gcp/archive/external-metastores/#hive-configuration-options" title="">

本節描述特定於蜂巢選項。

建立一個外部metastore使用UI<一個class="headerlink" href="//www.eheci.com/docs.gcp/archive/external-metastores/#set-up-an-external-metastore-using-the-ui" title="">

建立一個外部metastore使用磚界麵:

單擊集群按鈕欄。
點擊創建集群。

輸入以下<一個class="reference internal" href="//www.eheci.com/docs.gcp/docs.gcp/clusters/configure.html">火花配置選項:

本地模式

               #蜂巢特定的配置選項。#火花。hadoop添加前綴,以確保這些蜂巢特定選項將傳播到metastore客戶機。spark.hadoop.javax.jdo.option。ConnectionURL jdbc: mysql: / / < mysql-host >: < mysql-port > / < metastore-db >#為JDBC驅動程序類名稱metastore(運行時3.4及以後)spark.hadoop.javax.jdo.option。ConnectionDriverName org.mariadb.jdbc.Driver#為JDBC驅動程序類名稱metastore(3.4之前運行時)# spark.hadoop.javax.jdo.option。ConnectionDriverName com.mysql.jdbc.Driverspark.hadoop.javax.jdo.option。ConnectionUserName < mysql-username >spark.hadoop.javax.jdo.option。ConnectionPassword < mysql-password >#引發特定的配置選項spark.sql.hive.metastore.version#如果< hive-version >是0.13.x跳過這一個。spark.sql.hive.metastore.jars
              

遠程模式

               #蜂巢特定的配置選項#火花。hadoop添加前綴,以確保這些蜂巢特定選項將傳播到metastore客戶機。spark.hadoop.hive.metastore。uri節儉:/ / < metastore-host >: < metastore-port >#引發特定的配置選項spark.sql.hive.metastore.version#如果< hive-version >是0.13.x跳過這一個。spark.sql.hive.metastore.jars
              

繼續你的集群配置,下麵的指令<一個class="reference internal" href="//www.eheci.com/docs.gcp/docs.gcp/clusters/configure.html">創建一個集群。
點擊創建集群創建集群。

建立一個外部metastore使用init腳本<一個class="headerlink" href="//www.eheci.com/docs.gcp/archive/external-metastores/#set-up-an-external-metastore-using-an-init-script" title="">

Init腳本讓你連接到一個現有的蜂巢metastore沒有手動設置所需的配置。

本地模式<一個class="headerlink" href="//www.eheci.com/docs.gcp/archive/external-metastores/#local-mode" title="">

創建基礎要存儲目錄中的init腳本如果它不存在。下麵的示例使用dbfs: /磚/腳本。

運行以下代碼片段在一個筆記本上。代碼片段創建init腳本/磚/腳本/ external-metastore.sh在<一個class="reference internal" href="//www.eheci.com/docs.gcp/docs.gcp/dbfs/index.html">磚文件係統(DBFS)。這個init腳本寫要求配置選項配置文件命名00-custom-spark.conf類json格式/ / conf /磚/驅動程序在每一個節點的集群。磚提供違約引發的配置/ / conf / spark-branch.conf磚/驅動程序文件。配置文件的/ conf /磚/驅動程序目錄適用於反向字母順序排列。如果你想改變的名稱00-custom-spark.conf文件,確保它繼續應用之前spark-branch.conf文件。

                dbutils。fs。把(“磚/腳本/ external-metastore.sh”," " # ! / bin / sh| #負載環境變量來確定正確的JDBC驅動程序使用。|來源/etc/environment| #引用標簽(EOF)禁用變量插值的單引號。貓| < < EOF的> / conf / 00-custom-spark.conf /磚/驅動程序|(司機){| #蜂巢metastores在本地模式下的具體配置選項。| #火花。hadoop添加前綴,以確保這些蜂巢特定選項將傳播到metastore客戶機。|“spark.hadoop.javax.jdo.option。ConnectionURL " = " jdbc: mysql: / / < mysql-host >: < mysql-port > / < metastore-db >”|“spark.hadoop.javax.jdo.option。ConnectionUserName " = " < mysql-username >”|“spark.hadoop.javax.jdo.option。ConnectionPassword " = " < mysql-password >”|| #引發特定的配置選項|“spark.sql.hive.metastore。版”=“< hive-version >”| #如果< hive-version >是0.13.x跳過這一個。|“spark.sql.hive.metastore。jar < hive-jar-source >“=|| EOF||“DATABRICKS_RUNTIME_VERSION美元”| " ")|司機= " com.mysql.jdbc.Driver "|;;| *)|司機= " org.mariadb.jdbc.Driver "|;;| esac| #以來分別添加JDBC驅動程序必須使用變量擴展選擇正確的| #驅動程序版本。貓| < < EOF > > / conf / 00-custom-spark.conf /磚/驅動程序|“spark.hadoop.javax.jdo.option。ConnectionDriverName " = " $司機”|}| EOF|”“”。stripMargin,覆蓋=真正的)
               

用init腳本配置集群。
重新啟動集群。

遠程模式<一個class="headerlink" href="//www.eheci.com/docs.gcp/archive/external-metastores/#remote-mode" title="">

創建基礎要存儲目錄中的init腳本如果它不存在。下麵的示例使用dbfs: /磚/腳本。

在筆記本上運行下麵的代碼片段:

                dbutils。fs。把(“磚/腳本/ external-metastore.sh”," " # ! / bin / sh|| #引用標簽(EOF)禁用變量插值的單引號。貓| < < EOF的> / conf / 00-custom-spark.conf /磚/驅動程序|(司機){| #蜂巢metastores在遠程模式下的具體配置選項。| #火花。hadoop添加前綴,以確保這些蜂巢特定選項將傳播到metastore客戶機。|“spark.hadoop.hive.metastore。uri”=“節儉:/ / < metastore-host >: < metastore-port >”|| #引發特定的配置選項|“spark.sql.hive.metastore。版”=“< hive-version >”| #如果< hive-version >是0.13.x跳過這一個。|“spark.sql.hive.metastore。jar < hive-jar-source >“=|| #如果你需要使用AssumeRole,取消注釋以下設置。| #“spark.hadoop.fs.s3a。credentialsType AssumeRole“=| #“spark.hadoop.fs.s3a.stsAssumeRole。在攻擊" = " < sts-arn >”|}| EOF|”“”。stripMargin,覆蓋=真正的)
               

用init腳本配置集群。
重新啟動集群。

故障排除<一個class="headerlink" href="//www.eheci.com/docs.gcp/archive/external-metastores/#troubleshooting" title="">

集群不開始(由於不正確的初始化腳本設置)

如果一個init腳本設置外部metastore導致集群創建失敗,配置初始化腳本<一個class="reference internal" href="//www.eheci.com/docs.gcp/docs.gcp/clusters/init-scripts.html">日誌使用日誌和調試init腳本。

錯誤的SQL語句:InvocationTargetException

錯誤消息模式在整個異常堆棧跟蹤:
```
引起的通過:javax。jdo。JDOFatalDataStoreException:不能來開放一個測試連接來的鑒於數據庫。JDBCurl=(…]
```
外部metastore JDBC連接信息配置。驗證配置的主機名、端口、用戶名、密碼和JDBC驅動程序類名。同時,確保用戶名有權訪問metastore數據庫特權。
錯誤消息模式在整個異常堆棧跟蹤:
```
要求表失蹤:“星”在目錄”“模式”“。DataNucleus需要這表來執行它的持久性操作。(…]
```
外部metastore數據庫沒有正確初始化。確認您創建metastore數據庫,並把正確的數據庫名稱JDBC連接字符串。然後,開始一個新的集群使用以下兩個火花配置選項:
```
datanucleus.schema。一個utoCreateTables truedatanucleus.fixedDatastore假
```
這樣,蜂巢客戶端庫將嚐試metastore數據庫中創建和初始化表自動當它試圖訪問他們,但他認為他們缺席。

錯誤的SQL語句:AnalysisException:無法實例化org.apache.hadoop.hive.metastore.HiveMetastoreClient

異常堆棧錯誤消息在完整的異常:

             的指定的數據存儲司機(司機的名字)是不發現在的類路徑
            

集群配置為使用一個不正確的JDBC驅動程序。

datanucleus設置。一個utoCreateSchema to true doesn’t work as expected

默認情況下,磚還設置datanucleus.fixedDatastore來真正的,從而防止任何意外metastore數據庫結構的變化。因此,蜂巢客戶端庫不能創建metastore表即使你設置datanucleus.autoCreateSchema來真正的。這種策略是,在一般情況下,對生產環境更安全,因為它阻止了metastore數據庫不小心升級。

如果你想使用datanucleus.autoCreateSchema來初始化metastore數據庫,確保您設置datanucleus.fixedDatastore來假。同樣,你可能想要翻兩旗後初始化metastore數據庫提供更好的保護您的生產環境。