Saturday, 5 December 2020

Image sequence training with CNN and RNN

I'm making my first steps learning Deep Learning. I am trying to do Activity Recognition from images sequences (frames) of videos. As a result i am facing a problem with the training procedure.

Firstly i need to determine the architecture of my images folders:

Making Food -> p1 -> rgb_frame1.png,rgb_frame2.png ... rgb_frame200.png
Making Food -> p2 -> rgb_frame1.png,rgb_frame2.png ... rgb_frame280.png
                      ...
                      ...
                      ...
Taking  Medicine -> p1 -> rgb_frame1.png,rgb_frame2.png...rgbframe500.png

                      etc..
      

So the problem is that each folder can have a different number of frames so I get confused both with the input shape of the model and the timesteps which I should use. I am creating a model (as you see bellow) with time distirbuted CNN(pre trained VGG16) and LSTM that takes an input all the frames of all classes with the coresponding labels (in the above example making food would be the coresponding label to p1_rgb_frame1 etc.) and the final shape of x_train is (9000,200,200,3) where 9000 coresponds to all frames from all classes, 200 is height & width and 3 the channel of images. I am reshaping this data to (9000,1,200,200,3) in order to be used as input to the model. I am wondering and worried that I do not pass a proper timestep, as a result a wrong training , i have val_acc ~ 98% but when testing with different dataset is much lower. Can you suggest another way to do it more efficient?

  x = base_model.output
  x = Flatten()(x)
  features = Dense(64, activation='relu')(x)
  conv_model = Model(inputs=base_model.input, outputs=features)    
  for layer in base_model.layers:
      layer.trainable = False
       
  model = Sequential()
  model.add(TimeDistributed(conv_model, input_shape=(None,200,200,3)))
  model.add(LSTM(32, return_sequences=True))
  model.add(LSTM(16))


from Image sequence training with CNN and RNN

No comments:

Post a Comment