So, this question ends up being both about python and S3.
Let's say I have an S3 Bucket with these files :
file1 --------- 2GB
file2 --------- 3GB
file3 --------- 1.9GB
file4 --------- 5GB
These files were uploaded using a presigned post URL for S3
What I need to do is to give the client the ability to download them all in a ZIP (or similar), but I can't do it in memory neither on the server storage as this is a serverless setup.
From my understanding, ideally the server needs to:
- Start a multipartupload job on S3
- Probably need to send a chunk to the multipart job as the header of the zip file;
- Download each file in the bucket chunk by chunk in some sort of stream as to not overflow memory
- Use said stream above to them create a zip chunk and send this in the multipart job
- Finish the multipart job and the zip file
Now, I honestly have no idea how to achieve this and if it is even possible, but some questions are :
- How do I download a file in S3 in chunks? Preferably using boto3 or botocore
- How do I create a zip file in chunks while freeing memory?
- How do I connect this all in a multipartupload?
Edit: Now that I think about it, maybe I don't even need to put the ZIP file in S3, I can just directly stream to the client right? That would be so much better actually
Here's some hypothetical code assuming my edit above :
#Let's assume Flask
@app.route(/'download_bucket_as_zip'):
def stream_file():
def stream():
#Probably needs to yield zip headers/metadata?
for file in getFilesFromBucket():
for chunk in file.readChunk(4000):
zipchunk = bytesToZipChunk(chunk)
yield zipchunk
return Response(stream(), mimetype='application/zip')
from Creating large zip files in AWS S3 in chunks
No comments:
Post a Comment