Load data
This section covers information about loading data specifically for ML and DL applications. For general information about loading data, seeIngest data into the Databricks Lakehouse.
Store files for data loading and model checkpointing
Machine learning applications may need to use shared storage for data loading and model checkpointing. This is particularly important for distributed deep learning.
Load tabular data
You can load tabular machine learning data fromtablesor files (for example, seeCSV file). You can convert Apache Spark DataFrames into pandas DataFrames using thePySpark methodtoPandas()
, and then optionally convert to NumPy format using thepandas methodto_numpy()
.