Let's say I have a generator like
gen = (i*2 for i in range(100))
and I now want to create a bytes object containing all the values that generator yields. I could do the following:
b = bytes(gen)
My question now is: since bytes
objects are immutable, how does the memory allocation work in this case? Do I have to assume that for every element the generator yields, there is a new bytes
object created, with the previous content plus another element copied into it? This would be very inefficient especially for generators of bigger lenghts. And since the generator does not provide any length information, it seems there wouldn't be any other way of pre-allocating the needed memory internally.
Then again, what would be a better way to achieve this, with as few as possible memory usage? If I used a (mutable) bytearray
first and casted that into a bytes
object?
b = bytes(bytearray(gen))
Or even a list?
b = bytes(list(gen))
But that looks somehow strange and counter-intuitive...
Background: The specific generator I have reads bytes (as Python integers in 0..255) one at a time over a C-API from another module (.pyd), and the overall length of the sequence is already known beforehand, with up to 2**25 bytes in there. My readout function should collect those and return a bytes
object, which I thought was appropriate, since the data is read only.
from Python bytes object from generator
No comments:
Post a Comment