Thursday 5 August 2021

Python bytes object from generator

Let's say I have a generator like

gen = (i*2 for i in range(100))

and I now want to create a bytes object containing all the values that generator yields. I could do the following:

b = bytes(gen)

My question now is: since bytes objects are immutable, how does the memory allocation work in this case? Do I have to assume that for every element the generator yields, there is a new bytes object created, with the previous content plus another element copied into it? This would be very inefficient especially for generators of bigger lenghts. And since the generator does not provide any length information, it seems there wouldn't be any other way of pre-allocating the needed memory internally.

Then again, what would be a better way to achieve this, with as few as possible memory usage? If I used a (mutable) bytearray first and casted that into a bytes object?

b = bytes(bytearray(gen))

Or even a list?

b = bytes(list(gen))

But that looks somehow strange and counter-intuitive...


Background: The specific generator I have reads bytes (as Python integers in 0..255) one at a time over a C-API from another module (.pyd), and the overall length of the sequence is already known beforehand, with up to 2**25 bytes in there. My readout function should collect those and return a bytes object, which I thought was appropriate, since the data is read only.



from Python bytes object from generator

No comments:

Post a Comment