We want to use kedro
to control our ML pipelines in Azure Databricks.
We are querying (and joining) relatively large tables in Databricks' Lakehouse. Therefore, we would like to include those joins in the DataCatalog without bringing the full precedent tables into memory. Something like:
scooters_query:
type: pandas.SQLQueryDataSet
credentials: scooters_credentials
sql: select * from cars where gear=4
load_args:
index_col: [name]
Is there a way to perform this in Databricks?
from Is there a way to include an Azure Databricks Lakehouse query as a DataCatalog dataset in kedro?
No comments:
Post a Comment