I have many files, and each file is read as a matrix of shape (n, 1000)
, where n may be different from file to file.
I'd like to concatenate all of them into a single big Numpy array. I currently do this:
dataset = np.zeros((100, 1000))
for f in glob.glob('*.png'):
x = read_as_numpyarray(f) # custom function; x is a matrix of shape (n, 1000)
dataset = np.vstack((dataset, x))
but it is inefficient, since I redefine dataset
many times by stacking the existing array with the next file that is read.
How to do this in a better way with Numpy, avoiding that the whole dataset is rewritten in memory many times?
NB: the final big Numpy array might take 10 GB.
from Building a Numpy array by appending data (without knowing the full size in advance)
No comments:
Post a Comment