Tuesday, 11 October 2022

Is there a way to include an Azure Databricks Lakehouse query as a DataCatalog dataset in kedro?

We want to use kedro to control our ML pipelines in Azure Databricks.

We are querying (and joining) relatively large tables in Databricks' Lakehouse. Therefore, we would like to include those joins in the DataCatalog without bringing the full precedent tables into memory. Something like:

scooters_query:
  type: pandas.SQLQueryDataSet
  credentials: scooters_credentials
  sql: select * from cars where gear=4
  load_args:
    index_col: [name]

Is there a way to perform this in Databricks?



from Is there a way to include an Azure Databricks Lakehouse query as a DataCatalog dataset in kedro?

No comments:

Post a Comment