Monday, 11 February 2019

Computing the gradients of new state (of the RNN) with respect to model parameters, (including CNN for inputs), in tensorflow

I have the following model in tensorflow:

inputs = tf.layers.conv1d(inputs, filters=100, kernel_size=1, padding="same")
inputs = tf.layers.batch_normalization(inputs, training=phase_train, name="bn")
inputs = tf.nn.relu(inputs)

with tf.variable_scope('lstm_model'):

    cell = tf.nn.rnn_cell.GRUCell(cell_size, kernel_initializer=kernel_initializer)
    # output1: shape=[1, time_steps, 32]
    output, new_state = tf.nn.dynamic_rnn(cell, inputs, dtype=m_dtype)

with tf.variable_scope("output"):
    output = tf.reshape(output, shape=[-1, cell_size])
    output = tf.layers.dense(output, units=num_classes,
                             kernel_initializer=kernel_initializer)

    output = tf.cond(phase_train,
                     lambda: tf.reshape(output, shape=[34, -1, num_classes]),
                     lambda: tf.reshape(output, shape=[14, -1, num_classes]))

    return output, new_state, model_summary

Now when I attempt to find the gradients of the new_state with respect to the model parmaeters, I got an error:

grads_new_state_wrt_vars = tf.gradients(new_state, tf.trainable_variables())

and the error:

TypeError: Fetch argument None has invalid type <class 'NoneType'>

Please note that when I printed out the gradients tensors, I got the following:

for g in grads_new_state_wrt_vars:
    print('**', g)

** None
** None
** None
** None
** Tensor("gradients/model/lstm_model/rnn/while/gru_cell/MatMul/Enter_grad/b_acc_3:0", shape=(220, 240), dtype=float64)
** Tensor("gradients/model/lstm_model/rnn/while/gru_cell/BiasAdd/Enter_grad/b_acc_3:0", shape=(240,), dtype=float64)
** Tensor("gradients/model/lstm_model/rnn/while/gru_cell/MatMul_1/Enter_grad/b_acc_3:0", shape=(220, 120), dtype=float64)
** Tensor("gradients/model/lstm_model/rnn/while/gru_cell/BiasAdd_1/Enter_grad/b_acc_3:0", shape=(120,), dtype=float64)
** None
** None

Finally, the weights in the network are printed below:

for v in tf.trainable_variables():
    print(v.name)

model/conv1d/kernel:0
model/conv1d/bias:0
model/bn/gamma:0
model/bn/beta:0
model/lstm_model/rnn/gru_cell/gates/kernel:0
model/lstm_model/rnn/gru_cell/gates/bias:0
model/lstm_model/rnn/gru_cell/candidate/kernel:0
model/lstm_model/rnn/gru_cell/candidate/bias:0
model/output/dense/kernel:0
model/output/dense/bias:0

Therefore, how come that the gradients can't be computed wrt to the weights of the first conv, and batch norm, layers in the network?

Please note that I don't have the same problem when replacing new_state by output.

Any help is much appreciated!!



from Computing the gradients of new state (of the RNN) with respect to model parameters, (including CNN for inputs), in tensorflow

No comments:

Post a Comment