注冊mlflow定義模型,泡菜費爾……-磚- 7290

Saeid_H · ‎03-22-2023

親愛的社區,

我想基本上存儲2 pickle文件在訓練和模型注冊表和我的keras模型。所以當我從另一個工作空間訪問模型(使用mlflow.set_registery_uri()),這些模型可以訪問。我使用的定製mlflow模型如下圖:

類KerasModel (mlflow.pyfunc.PythonModel): def __init__(自我、模型、tokenizer_path label_encoder_path):自我。模型=自我。tokenizer_path = tokenizer_path自我。label_encoder_path = label_encoder_path def _load_tokenizer(自我):返回joblib.load (self.tokenizer_path) def _load_label_encoder(自我):返回joblib.load (self.label_encoder_path) def預測(自我、上下文input_data): y_pred = self.model.predict (input_data)返回y_pred

這裏是我的訓練腳本:

進口joblib進口mlflow mlflow進口。keras mlflow進口。從keras.preprocessing tensorflow。文本從sklearn進口記號賦予器。預處理進口LabelEncoder進口keras tensorflow #加載和預處理數據導入培訓/測試分裂X_train, y_train = get_training_data() # # # # # # # # #做數據預處理.....# # # # # # # # tokenizer_artifact_path = " / dbfs / tmp /火車/記號賦予器。joblib pkl”。轉儲(fitted_tokenizer tokenizer_artifact_path) label_encoder_artifact_path = " / dbfs / tmp /火車/ label_encoder。joblib pkl”。轉儲(fitted_label_encoder label_encoder_artifact_path) mlflow.start_run mlflow_run(): #適合keras模型和對數模型# # # # # # # # #構建..... keras模型# # # # # # # #模型,model_history =模型。適合(X_train y_train mlflow.keras)。log_model(模型、“模型”)#日誌標簽編碼器和記號賦予器工件mlflow.log_artifact (tokenizer_artifact_path) mlflow.log_artifact (label_encoder_artifact_path) #創建一個PyFunc模型使用訓練Keras模型和標簽編碼器pyfunc_model = KerasModel mlflow.pyfunc(模型、tokenizer_artifact_path label_encoder_artifact_path)。log_model (“custom_model”, python_model = pyfunc_model) #得到mlflow工件uri artifact_uri = mlflow_run.info.artifact_uri model_uri = artifact_uri + " / custom_model " #注冊模型mlflow模型注冊如果mlflow提供。set_registry_uri mlflow(“我的registery_uri”)。register_model (model_uri name = " keras_clssification ")

問題是當我想訪問這個注冊模型從另一個空間,我可以加載模型但不是pickle文件和它拋出的錯誤

FileNotFoundError (Errno 2):沒有這樣的文件或目錄:/ dbfs / tmp /火車/ label_encoder.pkl”

我用下麵的代碼:

mlflow。set_registery_uri(“我的model_registery_uri”)模型= mlflow.pyfunc.load_model (“model_uri”) unwrapped_model = model.unwrap_python_model () label_encoder = unwrapped_model._load_label_encoder()記號賦予器= unwrapped_model._load_tokenizer ()

它工作在相同的路徑識別的工作空間。但在其他工作區還沒有訪問。我的問題是如何存儲這兩個泡菜有史以來模型,該模型的文件,這些文件是嗎?

我已經檢查了這個解決方案,不幸的是我不能完全理解它。

如果你能把你的答案用代碼我非常感謝!

提前感謝!

匿名 · ‎03-23-2023

@Saeid Hedayati:

存儲pickle文件連同MLflow模型,您可以包括工件當記錄模型。您可以修改您的培訓腳本如下:

進口joblib進口mlflow mlflow進口。keras mlflow進口。從keras.preprocessing tensorflow。文本從sklearn進口記號賦予器。預處理進口LabelEncoder進口keras tensorflow #加載和預處理數據導入培訓/測試分裂X_train, y_train = get_training_data() # # # # # # # # #做數據預處理.....# # # # # # # # tokenizer_artifact_path = " / dbfs / tmp /火車/記號賦予器。joblib pkl”。轉儲(fitted_tokenizer tokenizer_artifact_path) label_encoder_artifact_path = " / dbfs / tmp /火車/ label_encoder。joblib pkl”。轉儲(fitted_label_encoder label_encoder_artifact_path) mlflow.start_run mlflow_run(): #適合keras模型和對數模型# # # # # # # # #構建..... keras模型# # # # # # # #模型,model_history =模型。適合(X_train y_train mlflow.keras)。log_model(模型、“模型”)#日誌標簽編碼器和記號賦予器構件mlflow.log_artifact (tokenizer_artifact_path) mlflow.log_artifact (label_encoder_artifact_path) #創建一個PyFunc模型使用訓練Keras模型和標簽編碼器pyfunc_model = KerasModel(模型、tokenizer_artifact_path label_encoder_artifact_path) #日誌PyFunc mlflow.pyfunc模型工件。log_model (pyfunc_model custom_model,工件={“記號賦予器”:tokenizer_artifact_path,“label_encoder”: label_encoder_artifact_path}) #得到mlflow工件uri artifact_uri = mlflow_run.info.artifact_uri model_uri = artifact_uri + " / custom_model " #注冊模型mlflow模型注冊如果mlflow提供。set_registry_uri mlflow(“我的registery_uri”)。register_model (model_uri name = " keras_clssification ")

在上麵的代碼中,工件(即。,the pickle files) are logged along with the PyFunc model using the

mlflow.pyfunc.log_model()方法。工件被指定為一個字典的鍵是工件的名稱和值是工件的路徑文件。

加載模型和構件在另一個工作空間,您可以使用下麵的代碼:

進口mlflow。pyfunc進口joblib #從MLflow模型注冊MLflow加載模型。set_registry_uri(“我的model_registry_uri”)模型= mlflow.pyfunc.load_model (model_uri) #加載構件tokenizer_path = model.metadata (“signature_def”) (“serving_default”)(“輸入”)(“記號賦予器”)。string_value label_encoder_path = model.metadata [' signature_def '] [' serving_default '](“輸入”)(“label_encoder”)。string_value記號賦予器= joblib.load (tokenizer_path) label_encoder = joblib.load (label_encoder_path) # PyFunc模型和預測新數據unwrapped_model = model._get_unwrapped_model () y_pred = unwrapped_model.predict (input_data)

在上麵的代碼中,我們加載模型,然後提取路徑的工件模型元數據。然後加載構件使用joblib.load()和使用它們來預測新數據。

Saeid_H · ‎03-25-2023

嗨@Suteja卡努裏人,

謝謝你的解決方案。我還注意到,而不是經過pickle文件路徑,我可以通過fitted_tokenizer fitted_label_encoder對象直接KerasModel類。這個解決方案為我工作。但你也看起來正確的!

匿名 · ‎04-01-2023

@Saeid Hedayati:

是的,這是另一種記號賦予器和標簽編碼器對象直接傳遞給KerasModel類而不是通過他們的路徑。我很高興聽到這個解決方案為你工作!如果你有任何其他問題,請讓我知道。

Saeid_H · ‎04-03-2023

謝謝@Suteja卡努裏人對你的支持,非常感激!

磚

注冊mlflow定義模型,pickle文件