I'm making my first steps learning Deep Learning. I am trying to do Activity Recognition from images sequences (frames) of videos. As a result i am facing a problem with the training procedure.
Firstly i need to determine the architecture of my images folders:
Making Food -> p1 -> rgb_frame1.png,rgb_frame2.png ... rgb_frame200.png
Making Food -> p2 -> rgb_frame1.png,rgb_frame2.png ... rgb_frame280.png
...
...
...
Taking Medicine -> p1 -> rgb_frame1.png,rgb_frame2.png...rgbframe500.png
etc..
So the problem is that each folder can have a different number of frames so I get confused both with the input shape of the model and the timesteps which I should use. I am creating a model (as you see bellow) with time distirbuted CNN(pre trained VGG16) and LSTM that takes an input all the frames of all classes with the coresponding labels (in the above example making food would be the coresponding label to p1_rgb_frame1 etc.) and the final shape of x_train
is (9000,200,200,3)
where 9000
coresponds to all frames from all classes, 200
is height & width and 3
the channel of images. I am reshaping this data to (9000,1,200,200,3)
in order to be used as input to the model. I am wondering and worried that I do not pass a proper timestep, as a result a wrong training , i have val_acc ~ 98% but when testing with different dataset is much lower. Can you suggest another way to do it more efficient?
x = base_model.output
x = Flatten()(x)
features = Dense(64, activation='relu')(x)
conv_model = Model(inputs=base_model.input, outputs=features)
for layer in base_model.layers:
layer.trainable = False
model = Sequential()
model.add(TimeDistributed(conv_model, input_shape=(None,200,200,3)))
model.add(LSTM(32, return_sequences=True))
model.add(LSTM(16))
from Image sequence training with CNN and RNN
No comments:
Post a Comment