Thursday, 29 September 2022

aiohttp: fast parallel downloading of large files

I'm using aiohttp to download large files (~150MB-200MB each).

Currently I'm doing for each file:

async def download_file(session: aiohttp.ClientSession, url: str, dest: str):
    chunk_size = 16384
    async with session.get(url) as response:
        async with aiofiles.open(dest, mode="wb") as f:
            async for data in response.content.iter_chunked(chunk_size):
                await f.write(data)

I create multiple tasks of this coroutine to achieve concurrency. I'm wondering:

  1. What is the best value for chunk_size?
  2. Is calling iter_chunked(chunk_size) is better then just doing data = await response.read() and writing that to disk? In that case, how can I report the download progress?
  3. How many tasks made of this coroutine should I create?
  4. Is there a way to download multiple parts of the same file in parallel, is it something that aiohttp already does?


from aiohttp: fast parallel downloading of large files

No comments:

Post a Comment