Databricks Runtime 7.0 ML(不支持)

數據ricks在2020年6月發布了這張圖片。

Databricks Runtime 7.0 for Machine Learning為機器學習和數據科學提供了一個現成的環境Databricks Runtime 7.0(不支持).Databricks Runtime ML包含許多流行的機器學習庫,包括TensorFlow、PyTorch和XGBoost。它還支持使用Horovod進行分布式深度學習訓練。

有關更多信息,包括創建Databricks Runtime ML集群的說明,請參見用於機器學習的Databricks運行時

新特性和主要變化

Databricks Runtime 7.0 ML構建在Databricks Runtime 7.0之上。有關Databricks Runtime 7.0中新增內容(包括Apache Spark MLlib和SparkR)的信息,請參見Databricks Runtime 7.0(不支持)發行說明。

GPU-aware調度

Databricks Runtime 7.0 ML支持來自Apache Spark 3.0的gpu感知調度。Databricks會自動為您配置它。看到GPU調度

ML Python環境的主要更改

本節描述預安裝的ML Python環境的主要變化數據庫運行時6.6 ML(不支持).中基本Python環境的主要更改Databricks Runtime 7.0(不支持).有關已安裝Python包及其版本的完整列表,請參見Python庫

Python包升級

  • Tensorflow 1.15.0 -> 2.2.0

  • Tensorboard 1.15.0 -> 2.2.2

  • Pytorch 1.4.0 -> 1.5.0

  • Xgboost 0.90 -> 1.1.1

  • Sparkdl 1.6.0-db1 -> 2.1.0-db1

  • hyperopt 0.2.2。db1 - > 0.2.4.db1

Python包添加

  • lightgbm: tripwire

  • nltk: 3.4.5

  • petastorm: 0.9.2

  • 4.5.2情節:

Python包刪除

  • argparse

  • 寶途(使用boto3相反)

  • 彩色光

  • 棄用

  • et-xmlfile

  • fusepy

  • html5lib

  • jdcal

  • keras(使用tensorflow.keras相反)

  • keras-applications(使用tensorflow.keras.applications相反)

  • llvmlite

  • lxml

  • 鼻子

  • nose-exclude

  • numba

  • openpyxl

  • pathlib2

  • 厚度

  • pymongo

  • singledispatch

  • tensorboardX(使用torch.utils.tensorboard相反)

  • virtualenv

  • webencodings

對ML R環境的主要更改

Databricks Runtime 7.0 ML包含RStudio Server Open Source v1.2.5033的未修改版本,其源代碼可以在GitHub.閱讀更多關於RStudio服務器在磚上。

對ML Spark包、Java和Scala庫的更改

升級以下包。有些升級為快照兼容Apache Spark 3.0的版本:

  • 圖幀:0.7.0-db1-spark2.4 ->

  • spark-tensorflow-connector: 1.15.0 (Scala 2.11) -> 1.15.0 (Scala 2.12)

  • Xgboost4j和Xgboost4j -spark: 0.90 -> 1.0.0

  • mleap-databricks-runtime: 0.17.0-4882dc3(快照)

移除以下軟件包:

  • TensorFlow (Java)

  • TensorFrames

  • Apache Spark的深度學習管道(HorovodRunner在Python中可用)

增加了conda和pip命令以支持筆記本範圍的Python庫(公開預覽)

從Databricks Runtime 7.0 ML開始,您可以使用%皮普而且% conda命令來管理安裝在筆記本會話中的Python庫。還可以使用這些命令為筆記本電腦創建自定義環境,並在筆記本電腦之間重新生成該環境。若要啟用此功能,請在集群設置中設置火花配置spark.databricks.conda.condaMagic.enabled真正的.有關更多信息,請參見Notebook-scoped Python庫

已棄用和不支持的特性

Databricks Runtime 7.0 ML不支持訪問控製表.如果需要表訪問控製,我們建議使用Databricks Runtime 7.0。

已知的問題

  • 通過sample_input參數mlflow.spark.log_model為了記錄mleap格式的MLlib模型,由於mleap API的更改,失敗的AttributeError。升級到MLflow 1.9.0作為解決方案。您可以使用Notebook-scoped Python庫工作區庫

係統環境

Databricks Runtime 7.0 ML的係統環境與Databricks Runtime 7.0的不同之處在於:

  • DBUtils: Databricks Runtime ML中不包含庫工具(dbutils.library).您可以使用%皮普而且% conda而不是命令。看到Notebook-scoped Python庫

  • 對於GPU集群,NVIDIA GPU庫如下:

    • CUDA 10.1更新

    • cuDNN 7.6.5

    • NCCL 2.7.3

    • TensorRT 6.0.1中

以下部分列出了Databricks Runtime 7.0 ML中包含的與Databricks Runtime 7.0中包含的不同的庫。

Python庫

Databricks Runtime 7.0 ML使用Conda進行Python包管理,並包含許多流行的ML包。下麵介紹Databricks Runtime 7.0 ML的Conda環境。

CPU集群上的Python

的名字databricks-ml渠道-pytorch-違約依賴關係-_libgcc_mutex = 0.1 =主要-absl-py = 0.9.0 = py37_0-= py37_0 1.3.0 asn1crypto =版本-阿斯特= 0.8.0 = py37_0-backcall = 0.1.0 = py37_0-補丁= 1.0 = py_2-bcrypt = 3.1.7 = py37h7b6447c_1-布拉斯特區= 1.0 = mkl-信號燈= 1.4 = py37_0-boto3 = 1.12.0 = py_0-botocore = 1.15.0 = py_0-c-ares = 1.15.0 = h7b6447c_1001-ca證書= 2020.1.1 = 0-cachetools = 4.1.0 = py_1-certifi = 2020.4.5.1 = py37_0-cffi = 1.14.0 = py37h2e261b9_0-chardet = 3.0.4 = py37_1003-單擊= 7.0 = py37_0-= py_0 1.3.0 cloudpickle =版本-configparser = 3.7.4 = py37_0-cpuonly = 1.0 = 0-密碼= 2.8 = py37h1ba5d50_0-周期計= 0.10.0 = py37_0-cython = 0.29.15 = py37he6710b0_0-4.4.1裝飾= = py_0-蒔蘿= 0.3.1.1 = py37_1-docutils = 0.15.2 = py37_0-entrypoints = 0.3 = py37_0-瓶= 1.1.1 = py_1-freetype的= 2.9.1 = h8a8886c_1-未來= 0.18.2 = py37_1-恐嚇= 0.3.3 = py_0-gitdb2 = 2.0.6 = py_0-gitpython = 3.0.5 = py_0-google-auth = 1.11.2 = py_0-google-auth-oauthlib = 0.4.1 = py_2-google-pasta = 0.2.0 = py_0-grpcio = 1.27.2 = py37hf8bcb03_0-gunicorn = 20.0.4 = py37_0-h5py = 2.10.0 = py37h7918eee_0-hdf5 = 1.10.4 = hb1b8bf9_0-icu = 58.2 = he6710b0_3-idna = 2.8 = py37_0-intel-openmp = 2020.0 = 166-ipykernel = 5.1.4 = py37h39e3cac_0-ipython = 7.12.0 = py37h5ca1d4c_0-ipython_genutils = 0.2.0 = py37_0-itsdangerous = 1.1.0 = py37_0-絕地= 0.14.1 = py37_0-jinja2 = 2.11.1 = py_0-jmespath = 0.9.4 = py_0-joblib = 0.14.1 = py_0-jpeg = 9 b = h024ee3a_2-jupyter_client = 5.3.4 = py37_0-jupyter_core = 4.6.1 = py37_0-kiwisolver = 1.1.0 = py37he6710b0_0-krb5 = 1.16.4 = h173b8e3_0-ld_impl_linux - 64 = 2.33.1 = h53a641e_7-libedit = 3.1.20181209 = hc058e9b_0-libffi = 3.2.1 = hd88cf55_4-libgcc-ng = 9.1.0 = hdf63c60_0-libgfortran-ng = 7.3.0 = hdf63c60_0-libpng = 1.6.37 = hbc83047_0-libpq = 11.2 = h20c2e04_0-libprotobuf = 3.11.4 = hd408876_0-libsodium = 1.0.16 = h1bed415_0-libstdcxx-ng = 9.1.0 = hdf63c60_0-libtiff = 4.1.0 = h2733197_0-lightgbm = tripwire = py37he6710b0_0-lz4-c = 1.8.1.2 = h14c3975_0-尖吻鯖鯊= 1.1.2 = py_0-減價= 3.1.1 = py37_0-markupsafe = 1.1.1 = py37h7b6447c_0-matplotlib-base = 3.1.3 = py37hef1b27d_0-mkl = 2020.0 = 166-mkl-service = tripwire = py37he904b0f_0-mkl_fft = 1.0.15 = py37ha843d7b_0-mkl_random = 1.1.0 = py37hd6b4f25_0-ncurses = 6.2 = he6710b0_1-networkx = 2.4 = py_0-忍者= 1.9.0 = py37hfd86e86_0-nltk = 3.4.5 = py37_0-numpy = 1.18.1 = py37h4f9e942_0-numpy-base = 1.18.1 = py37hde5b4d6_1-oauthlib = 3.1.0 = py_0-olefile = 0.46 = py37_0-openssl = 1.1.1g = h7b6447c_0-包裝= 20.1 = py_0-熊貓= 1.0.1 = py37h0573a6f_0-paramiko = 2.7.1 = py_0-parso = 0.5.2 = py_0-容易受騙的人= 0.5.1 = py37_0-pexpect = 4.8.0 = py37_0-pickleshare = 0.7.5 = py37_0-枕頭= 7.0.0 = py37hb39fc2d_0-皮普= 20.0.2 = py37_3-4.5.2情節= = py_0-prompt_toolkit = 3.0.3 = py_0-protobuf = 3.11.4 = py37he6710b0_0-psutil = 5.6.7 = py37h7b6447c_0-psycopg2 = 2.8.4 = py37h1ba5d50_0-ptyprocess = 0.6.0 = py37_0-pyasn1 = 0.4.8 = py_0-pyasn1-modules = 0.2.7 = py_0-pycparser = 2.19 = py37_0-pygments = 2.5.2 = py_0-pyjwt = 1.7.1上= py37_0-= py37h7b6447c_0 1.3.0 pynacl =版本-pyodbc = 4.0.30 = py37he6710b0_0-pyopenssl = 19.1.0 = py37_0-pyparsing = 2.4.6 = py_0-pysocks = 1.7.1上= py37_0-python =第3.7.6 = h0371630_2-python-dateutil = 2.8.1發布= py_0-python編輯器的1.0.4 = = py_0-pytorch = 1.5.0 = py3.7_cpu_0-pytz = 2019.3 = py_0-pyzmq = 18.1.1 = py37he6710b0_0-readline = 7.0 = h7b6447c_5-= 2.22.0 = py37_1請求-= py_0 1.3.0 requests-oauthlib =版本-重試= 1.3.3 = py37_2-rsa = 4.0 = py_0-s3transfer = 0.3.3 = py37_0-scikit-learn = 0.22.1 = py37hd81dba3_0-scipy = 1.4.1 = py37h0b6359f_0-setuptools = 45.2.0 = py37_0-simplejson = 3.17.0 = py37h7b6447c_0-6 = 1.14.0 = py37_0-smmap2 = 2.0.5 = py37_0-sqlite = 3.31.1 = h62c20be_1-sqlparse = 0.3.0 = py_0-statsmodels = 0.11.0 = py37h7b6447c_0它-彙總= 0.8.3 = py37_0-tk = 8.6.8 = hbc83047_0-torchvision = 0.6.0 = py37_cpu-龍卷風= 6.0.3 = py37h7b6447c_3-tqdm = 4.42.1 = py_0-traitlets = 4.3.3 = py37_0-unixodbc = 2.3.7 = h14c3975_0-urllib3 = 1.25.8 = py37_0-wcwidth = 0.1.8 = py_0-websocket-client = 0.56.0 = py37_0-werkzeug = 1.0.0 = py_0-輪= 0.34.2 = py37_0-打包= 1.11.2 = py37h7b6447c_0-xz = 5.2.4 = h14c3975_4-zeromq = 4.3.1 = he6710b0_3-zlib = 1.2.11 = h7b6447c_3-zstd = 1.3.7 = h0b5b093_0-皮普-astunparse = = 1.6.3-databricks-cli = = 0.11.0-diskcache = = 4.1.0-碼頭工人= = 4.2.1-大猩猩= = 0.3.0-horovod = = 0.19.1-hyperopt = = 0.2.4.db1-keras-preprocessing = = 1.1.2-mleap = = 0.16.0-mlflow = = 1.8.0-opt-einsum = = 3.2.1之上-petastorm = = 0.9.2-pyarrow = = 0.15.1-pyyaml = = 5.3.1-querystring-parser = = 4-seaborn = = 0.10.0-sparkdl = = 2.1.0-db1-2.2.2 tensorboard = =-tensorboard-plugin-wit = = 1.6.0.post3-tensorflow-cpu = = 2.2.0-tensorflow-estimator = = 2.2.0-termcolor = = 1.1.0-xgboost = = 1.1.1前綴/磚/ conda / env / databricks-ml

GPU集群上的Python

的名字databricks-ml-gpu渠道-pytorch-違約依賴關係-_libgcc_mutex = 0.1 =主要-absl-py = 0.9.0 = py37_0-= py37_0 1.3.0 asn1crypto =版本-阿斯特= 0.8.0 = py37_0-backcall = 0.1.0 = py37_0-補丁= 1.0 = py_2-bcrypt = 3.1.7 = py37h7b6447c_1-布拉斯特區= 1.0 = mkl-信號燈= 1.4 = py37_0-boto3 = 1.12.0 = py_0-botocore = 1.15.0 = py_0-c-ares = 1.15.0 = h7b6447c_1001-ca證書= 2020.1.1 = 0-cachetools = 4.1.0 = py_1-certifi = 2020.4.5.2 = py37_0-cffi = 1.14.0 = py37h2e261b9_0-chardet = 3.0.4 = py37_1003-單擊= 7.0 = py37_0-= py_0 1.3.0 cloudpickle =版本-configparser = 3.7.4 = py37_0-密碼= 2.8 = py37h1ba5d50_0-cudatoolkit = 10.1.243 = h6bb024c_0-周期計= 0.10.0 = py37_0-cython = 0.29.15 = py37he6710b0_0-4.4.1裝飾= = py_0-蒔蘿= 0.3.1.1 = py37_1-docutils = 0.15.2 = py37_0-entrypoints = 0.3 = py37_0-瓶= 1.1.1 = py_1-freetype的= 2.9.1 = h8a8886c_1-未來= 0.18.2 = py37_1-恐嚇= 0.3.3 = py_0-gitdb2 = 2.0.6 = py_0-gitpython = 3.0.5 = py_0-google-auth = 1.11.2 = py_0-google-auth-oauthlib = 0.4.1 = py_2-google-pasta = 0.2.0 = py_0-grpcio = 1.27.2 = py37hf8bcb03_0-gunicorn = 20.0.4 = py37_0-h5py = 2.10.0 = py37h7918eee_0-hdf5 = 1.10.4 = hb1b8bf9_0-icu = 58.2 = he6710b0_3-idna = 2.8 = py37_0-intel-openmp = 2020.0 = 166-ipykernel = 5.1.4 = py37h39e3cac_0-ipython = 7.12.0 = py37h5ca1d4c_0-ipython_genutils = 0.2.0 = py37_0-itsdangerous = 1.1.0 = py37_0-絕地= 0.14.1 = py37_0-jinja2 = 2.11.1 = py_0-jmespath = 0.9.4 = py_0-joblib = 0.14.1 = py_0-jpeg = 9 b = h024ee3a_2-jupyter_client = 5.3.4 = py37_0-jupyter_core = 4.6.1 = py37_0-kiwisolver = 1.1.0 = py37he6710b0_0-krb5 = 1.16.4 = h173b8e3_0-ld_impl_linux - 64 = 2.33.1 = h53a641e_7-libedit = 3.1.20181209 = hc058e9b_0-libffi = 3.2.1 = hd88cf55_4-libgcc-ng = 9.1.0 = hdf63c60_0-libgfortran-ng = 7.3.0 = hdf63c60_0-libpng = 1.6.37 = hbc83047_0-libpq = 11.2 = h20c2e04_0-libprotobuf = 3.11.4 = hd408876_0-libsodium = 1.0.16 = h1bed415_0-libstdcxx-ng = 9.1.0 = hdf63c60_0-libtiff = 4.1.0 = h2733197_0-lightgbm = tripwire = py37he6710b0_0-lz4-c = 1.8.1.2 = h14c3975_0-尖吻鯖鯊= 1.1.2 = py_0-減價= 3.1.1 = py37_0-markupsafe = 1.1.1 = py37h7b6447c_0-matplotlib-base = 3.1.3 = py37hef1b27d_0-mkl = 2020.0 = 166-mkl-service = tripwire = py37he904b0f_0-mkl_fft = 1.0.15 = py37ha843d7b_0-mkl_random = 1.1.0 = py37hd6b4f25_0-ncurses = 6.2 = he6710b0_1-networkx = 2.4 = py_0-忍者= 1.9.0 = py37hfd86e86_0-nltk = 3.4.5 = py37_0-numpy = 1.18.1 = py37h4f9e942_0-numpy-base = 1.18.1 = py37hde5b4d6_1-oauthlib = 3.1.0 = py_0-olefile = 0.46 = py37_0-openssl = 1.1.1g = h7b6447c_0-包裝= 20.1 = py_0-熊貓= 1.0.1 = py37h0573a6f_0-paramiko = 2.7.1 = py_0-parso = 0.5.2 = py_0-容易受騙的人= 0.5.1 = py37_0-pexpect = 4.8.0 = py37_0-pickleshare = 0.7.5 = py37_0-枕頭= 7.0.0 = py37hb39fc2d_0-皮普= 20.0.2 = py37_3-4.5.2情節= = py_0-prompt_toolkit = 3.0.3 = py_0-protobuf = 3.11.4 = py37he6710b0_0-psutil = 5.6.7 = py37h7b6447c_0-psycopg2 = 2.8.4 = py37h1ba5d50_0-ptyprocess = 0.6.0 = py37_0-pyasn1 = 0.4.8 = py_0-pyasn1-modules = 0.2.7 = py_0-pycparser = 2.19 = py37_0-pygments = 2.5.2 = py_0-pyjwt = 1.7.1上= py37_0-= py37h7b6447c_0 1.3.0 pynacl =版本-pyodbc = 4.0.30 = py37he6710b0_0-pyopenssl = 19.1.0 = py37_0-pyparsing = 2.4.6 = py_0-pysocks = 1.7.1上= py37_0-python =第3.7.6 = h0371630_2-python-dateutil = 2.8.1發布= py_0-python編輯器的1.0.4 = = py_0-pytorch = 1.5.0 = py3.7_cuda10.1.243_cudnn7.6.3_0-pytz = 2019.3 = py_0-pyzmq = 18.1.1 = py37he6710b0_0-readline = 7.0 = h7b6447c_5-= 2.22.0 = py37_1請求-= py_0 1.3.0 requests-oauthlib =版本-重試= 1.3.3 = py37_2-rsa = 4.0 = py_0-s3transfer = 0.3.3 = py37_0-scikit-learn = 0.22.1 = py37hd81dba3_0-scipy = 1.4.1 = py37h0b6359f_0-setuptools = 45.2.0 = py37_0-simplejson = 3.17.0 = py37h7b6447c_0-6 = 1.14.0 = py37_0-smmap2 = 2.0.5 = py37_0-sqlite = 3.31.1 = h62c20be_1-sqlparse = 0.3.0 = py_0-statsmodels = 0.11.0 = py37h7b6447c_0它-彙總= 0.8.3 = py37_0-tk = 8.6.8 = hbc83047_0-torchvision = 0.6.0 = py37_cu101-龍卷風= 6.0.3 = py37h7b6447c_3-tqdm = 4.42.1 = py_0-traitlets = 4.3.3 = py37_0-unixodbc = 2.3.7 = h14c3975_0-urllib3 = 1.25.8 = py37_0-wcwidth = 0.1.8 = py_0-websocket-client = 0.56.0 = py37_0-werkzeug = 1.0.0 = py_0-輪= 0.34.2 = py37_0-打包= 1.11.2 = py37h7b6447c_0-xz = 5.2.4 = h14c3975_4-zeromq = 4.3.1 = he6710b0_3-zlib = 1.2.11 = h7b6447c_3-zstd = 1.3.7 = h0b5b093_0-皮普-astunparse = = 1.6.3-databricks-cli = = 0.11.0-diskcache = = 4.1.0-碼頭工人= = 4.2.1-大猩猩= = 0.3.0-horovod = = 0.19.1-hyperopt = = 0.2.4.db1-keras-preprocessing = = 1.1.2-mleap = = 0.16.0-mlflow = = 1.8.0-opt-einsum = = 3.2.1之上-petastorm = = 0.9.2-pyarrow = = 0.15.1-pyyaml = = 5.3.1-querystring-parser = = 4-seaborn = = 0.10.0-sparkdl = = 2.1.0-db1-2.2.2 tensorboard = =-tensorboard-plugin-wit = = 1.6.0.post3-tensorflow-estimator = = 2.2.0-tensorflow-gpu = = 2.2.0-termcolor = = 1.1.0-xgboost = = 1.1.1前綴/磚/ conda / env / databricks-ml-gpu

包含Python模塊的Spark包

火花包

Python模塊

版本

graphframes

graphframes

0.8.0-db2-spark3.0

R庫

R庫與R庫在Databricks運行時7.0 Beta。

Java和Scala庫(Scala 2.12集群)

除了Databricks Runtime 7.0中的Java和Scala庫之外,Databricks Runtime 7.0 ML還包含以下jar:

組ID

工件ID

版本

com.typesafe.akka

akka-actor_2.12

2.5.23

ml.combust.mleap

mleap-databricks-runtime_2.12

0.17.0-4882dc3

ml.dmlc

xgboost4j-spark_2.12

1.0.0

ml.dmlc

xgboost4j_2.12

1.0.0

org.mlflow

mlflow-client

1.8.0

org.scala-lang.modules

scala-java8-compat_2.12

0.8.0

org.tensorflow

spark-tensorflow-connector_2.12

1.15.0