Hemant Vishwakarma: Creating large zip files in AWS S3 in chunks

Saturday, 29 August 2020

Creating large zip files in AWS S3 in chunks

So, this question ends up being both about python and S3.

Let's say I have an S3 Bucket with these files :

file1 --------- 2GB
file2 --------- 3GB
file3 --------- 1.9GB
file4 --------- 5GB

These files were uploaded using a presigned post URL for S3

What I need to do is to give the client the ability to download them all in a ZIP (or similar), but I can't do it in memory neither on the server storage as this is a serverless setup.

From my understanding, ideally the server needs to:

Start a multipartupload job on S3
Probably need to send a chunk to the multipart job as the header of the zip file;
Download each file in the bucket chunk by chunk in some sort of stream as to not overflow memory
Use said stream above to them create a zip chunk and send this in the multipart job
Finish the multipart job and the zip file

Now, I honestly have no idea how to achieve this and if it is even possible, but some questions are :

How do I download a file in S3 in chunks? Preferably using boto3 or botocore
How do I create a zip file in chunks while freeing memory?
How do I connect this all in a multipartupload?

Edit: Now that I think about it, maybe I don't even need to put the ZIP file in S3, I can just directly stream to the client right? That would be so much better actually

Here's some hypothetical code assuming my edit above :

  #Let's assume Flask
  @app.route(/'download_bucket_as_zip'):
  def stream_file():
    def stream():
      #Probably needs to yield zip headers/metadata?
      for file in getFilesFromBucket():
         for chunk in file.readChunk(4000):
            zipchunk = bytesToZipChunk(chunk)
            yield zipchunk
    return Response(stream(), mimetype='application/zip')

from Creating large zip files in AWS S3 in chunks

Hemant Vishwakarma

Saturday, 29 August 2020

Creating large zip files in AWS S3 in chunks

No comments:

Post a Comment