I have the following scenario that I need to solve with Dask scheduler and workers:
-
Dask program has N functions called in a loop (N defined by the user)
-
Each function is started with
delayed(func)(args)
to run in parallel. -
When each function from the previous point starts, it triggers W workers. This is how I invoke the workers:
futures = client.map(worker_func, worker_args) worker_responses = client.gather(futures)
That means that I need N * W workers to run everything in parallel. The problem is that this is not optimal as it's too much resource allocation, I run it on the cloud and it's expensive. Also, N is defined by the user, so I don't know beforehand how much processing capability I need to have.
Is there a way to queue up the workers in such a way that if I define that Dask has X workers, when a worker ends then the next one starts?
from Queueing up workers in Dask
No comments:
Post a Comment