Hemant Vishwakarma: Python multiprocessing: understanding logic behind `chunksize`

Sunday, 30 December 2018

Python multiprocessing: understanding logic behind `chunksize`

What factors determine an optimal chunksize argument to methods like multiprocessing.Pool.map()?

From the docs:

The chunksize argument is the same as the one used by the map() method. For very long iterables using a large value for chunksize can make the job complete much faster than using the default value of 1.

Example - say that I am:

Passing an iterable to .map() that has ~15 million elements;
Working on a machine with 24 cores and using the default processes = os.cpu_count() within multiprocessing.Pool().

My naive thinking is to give each of 24 workers an equally-sized chunk, i.e. 15_000_000 / 24 or 625,000. Large chunks should reduce turnover/overhead while fully utilizing all workers. But it seems that this is missing some potential downsides of giving large batches to each worker. Is this an incomplete picture, and what am I missing?

Part of my question stems from the default logic for if chunksize=None: both .map() and .starmap() call .map_async(), which looks like this:

def _map_async(self, func, iterable, mapper, chunksize=None, callback=None,
               error_callback=None):
    # ... (materialize `iterable` to list if it's an iterator)
    if chunksize is None:
        chunksize, extra = divmod(len(iterable), len(self._pool) * 4)  # ????
        if extra:
            chunksize += 1
    if len(iterable) == 0:
        chunksize = 0

What's the logic behind divmod(len(iterable), len(self._pool) * 4)? This implies that the chunksize will be closer to 15_000_000 / (24 * 4) == 156_250. What's the intention in multiplying len(self._pool) by 4?

This makes the resulting chunksize a factor of 4 smaller than my "naive logic" from above, which consists of just dividing the length of the iterable by number of workers in pool._pool.

_{Related answer that is helpful but a bit too high-level: Python multiprocessing: why are large chunksizes slower?.}

from Python multiprocessing: understanding logic behind `chunksize`

Hemant Vishwakarma

Sunday, 30 December 2018

Python multiprocessing: understanding logic behind `chunksize`

No comments:

Post a Comment