Tuesday 3 August 2021

Queueing up workers in Dask

I have the following scenario that I need to solve with Dask scheduler and workers:

  • Dask program has N functions called in a loop (N defined by the user)

  • Each function is started with delayed(func)(args) to run in parallel.

  • When each function from the previous point starts, it triggers W workers. This is how I invoke the workers:

    futures = client.map(worker_func, worker_args)     
    worker_responses = client.gather(futures)
    

That means that I need N * W workers to run everything in parallel. The problem is that this is not optimal as it's too much resource allocation, I run it on the cloud and it's expensive. Also, N is defined by the user, so I don't know beforehand how much processing capability I need to have.

Is there a way to queue up the workers in such a way that if I define that Dask has X workers, when a worker ends then the next one starts?



from Queueing up workers in Dask

No comments:

Post a Comment