Friday 5 March 2021

Azure Batch Service vs. Azure Databricks for Python Job

Let's say I have a Data Analysis Problem (e.g. csv data like Iris Dataset) where I want to do some data manipulation and processing with Pandas and Python. My Python Script is already written and each day when I receive a csv file, I want this data to be processed with my python script in the Azure cloud and the result will be written to an Azure Blob storage.

Now I have come across these links/approaches to solve this:

Does anybody has some experience with both approaches to run a python script as described above and maybe recommendations and what to consider (Pros/Cons)?

Goal of this question: What approach to choose or would you prefer: a) Azure Batch Service or b) Azure Databricks and why?

Things to consider for choosing the appropriate service:

  • price
  • convenience of setting up solution
  • monitoring possibilities
  • possibilities to scale if data grows or script-logic gets more complex over time
  • ease of integration with other services (e.g. storage)
  • flexibility with regards to libraries and frameworks (e.g. let's say later on it might become a data science problem and I want to add some h2o machine learning models into my analysis pipeline)
  • (maybe more I did not consider ...?)


from Azure Batch Service vs. Azure Databricks for Python Job

No comments:

Post a Comment