Saturday, 24 October 2020

Streaming decompression of S3 gzip source object to a S3 destination object using python?

Given a large gzip object in S3, what is a memory efficient (e.g. streaming) method in python3/boto3 to decompress the data and store the results back into another S3 object?

There is a similar question previously asked. However, all of the answers use a methodology in which the contents of the gzip file are first read into memory (e.g. ByteIO). These solutions are not viable for objects that are too big to fit in main memory.

For large S3 objects the contents need to be read, decompressed "on the fly", and then written to a different S3 object is some chunked fashion.

Thank you in advance for your consideration and response.



from Streaming decompression of S3 gzip source object to a S3 destination object using python?

No comments:

Post a Comment