Wednesday, 20 October 2021

Python create SharedMemory instance using existing buffer (bytes from marshal.dumps())

I would like to create an instance of multiprocessing.shared_memory.SharedMemory passing from outside the buffer to use to hold the data.

My use case is the following:

import marshal

from multiprocessing.shared_memory import SharedMemory


data = {'foo' 1, 'bar': 'some text'}
data_bytes = marshal.dumps(data)
shm = SharedMemory(create=True, size=len(data_bytes))

for i,b in enumerate(data_bytes):
    shm.buf[i] = b

As you can see I need to serialise some data (to later share it across multiple processes). The snippet above uses twice the memory that is actually needed since the serialised data stored in the data_bytes bytes variable needs to be copied inside the SharedMemory buffer (which also takes a considerable amount since in my use case the dimension of data is 1 GB).

The only non-viable solution I have found so far is to guess how much space the serialised data will take, allocate enough space in a SharedMemory instance and have marshal write on it, e.g.

shm = SharedMemory(create=True, size=BIG_ENOUGHT_SIZE)
marshal.dump(data, shm.buf.obj)

However, if my guess is too low, marshal.dump(data, shm.buf.obj) (correctly) throws an error because there is not enough space to write the serialised data.



from Python create SharedMemory instance using existing buffer (bytes from marshal.dumps())

No comments:

Post a Comment