I'm using a deep CNN+LSTM network to perfom a classification on a dataset of 1D signals. I'm using keras 2.2.4 backed by tensorflow 1.12.0. Since I have a large dataset and limited resources, I'm using a generator to load the data into the memory during the training phase. First, I tried this generator:
def data_generator(batch_size, preproc, type, x, y):
num_examples = len(x)
examples = zip(x, y)
examples = sorted(examples, key = lambda x: x[0].shape[0])
end = num_examples - batch_size + 1
batches = [examples[i:i + batch_size] for i in range(0, end, batch_size)]
random.shuffle(batches)
while True:
for batch in batches:
x, y = zip(*batch)
yield preproc.process(x, y)
Using the above method, I'm able to launch training with a mini-batch size up to 30 samples at a time. However, this kind of methods does not guarantee that the network will only train once on each sample per epoch. Considering this comment from Keras's website:
Sequenceare a safer way to do multiprocessing. This structure guarantees that the network will only train once on each sample per epoch which is not the case with generators.
I've tried another way of loading data using the following class:
class Data_Gen(Sequence):
def __init__(self, batch_size, preproc, type, x_set, y_set):
self.x, self.y = np.array(x_set), np.array(y_set)
self.batch_size = batch_size
self.indices = np.arange(self.x.shape[0])
np.random.shuffle(self.indices)
self.type = type
self.preproc = preproc
def __len__(self):
# print(self.type + ' - len : ' + str(int(np.ceil(self.x.shape[0] / self.batch_size))))
return int(np.ceil(self.x.shape[0] / self.batch_size))
def __getitem__(self, idx):
inds = self.indices[idx * self.batch_size:(idx + 1) * self.batch_size]
batch_x = self.x[inds]
batch_y = self.y[inds]
return self.preproc.process(batch_x, batch_y)
def on_epoch_end(self):
np.random.shuffle(self.indices)
I can confirm that using this method the network is training once on each sample per epoch but this time when I put more than 7 samples in the mini-batch, I got out of memory error:
OP_REQUIRES failed at random_op.cc: 202: Resource exhausted: OOM when allocating tensor with shape...............
I can confirm that I'm using the same model architecture, configuration, and machine to do this test. I'm wondering why would be a difference between these 2 ways of loading data??
Please don't hesitate to ask for more details in case needed.
Thanks in advance.
EDITED:
Here is the code I'm using to fit the model:
reduce_lr = keras.callbacks.ReduceLROnPlateau(
factor=0.1,
patience=2,
min_lr=params["learning_rate"])
checkpointer = keras.callbacks.ModelCheckpoint(
filepath=str(get_filename_for_saving(save_dir)),
save_best_only=False)
batch_size = params.get("batch_size", 32)
path = './logs/run-{0}'.format(datetime.now().strftime("%b %d %Y %H:%M:%S"))
tensorboard = keras.callbacks.TensorBoard(log_dir=path, histogram_freq=0,
write_graph=True, write_images=False)
if index == 0:
print(model.summary())
print("Model memory needed for batchsize {0} : {1} Gb".format(batch_size, get_model_memory_usage(batch_size, model)))
if params.get("generator", False):
train_gen = load.data_generator(batch_size, preproc, 'Train', *train)
dev_gen = load.data_generator(batch_size, preproc, 'Dev', *dev)
valid_metrics = Metrics(dev_gen, len(dev[0]) // batch_size, batch_size)
model.fit_generator(
train_gen,
steps_per_epoch=len(train[0]) / batch_size + 1 if len(train[0]) % batch_size != 0 else len(train[0]) // batch_size,
epochs=MAX_EPOCHS,
validation_data=dev_gen,
validation_steps=len(dev[0]) / batch_size + 1 if len(dev[0]) % batch_size != 0 else len(dev[0]) // batch_size,
callbacks=[valid_metrics, MyCallback(), checkpointer, reduce_lr, tensorboard])
# train_gen = load.Data_Gen(batch_size, preproc, 'Train', *train)
# dev_gen = load.Data_Gen(batch_size, preproc, 'Dev', *dev)
# model.fit_generator(
# train_gen,
# epochs=MAX_EPOCHS,
# validation_data=dev_gen,
# callbacks=[valid_metrics, MyCallback(), checkpointer, reduce_lr, tensorboard])
from Keras difference between generator and sequence
No comments:
Post a Comment