I have the following model in tensorflow:
inputs = tf.layers.conv1d(inputs, filters=100, kernel_size=1, padding="same")
inputs = tf.layers.batch_normalization(inputs, training=phase_train, name="bn")
inputs = tf.nn.relu(inputs)
with tf.variable_scope('lstm_model'):
cell = tf.nn.rnn_cell.GRUCell(cell_size, kernel_initializer=kernel_initializer)
# output1: shape=[1, time_steps, 32]
output, new_state = tf.nn.dynamic_rnn(cell, inputs, dtype=m_dtype)
with tf.variable_scope("output"):
output = tf.reshape(output, shape=[-1, cell_size])
output = tf.layers.dense(output, units=num_classes,
kernel_initializer=kernel_initializer)
output = tf.cond(phase_train,
lambda: tf.reshape(output, shape=[34, -1, num_classes]),
lambda: tf.reshape(output, shape=[14, -1, num_classes]))
return output, new_state, model_summary
Now when I attempt to find the gradients of the new_state
with respect to the model parmaeters, I got an error:
grads_new_state_wrt_vars = tf.gradients(new_state, tf.trainable_variables())
and the error:
TypeError: Fetch argument None has invalid type <class 'NoneType'>
Please note that when I printed out the gradients tensors, I got the following:
for g in grads_new_state_wrt_vars:
print('**', g)
** None
** None
** None
** None
** Tensor("gradients/model/lstm_model/rnn/while/gru_cell/MatMul/Enter_grad/b_acc_3:0", shape=(220, 240), dtype=float64)
** Tensor("gradients/model/lstm_model/rnn/while/gru_cell/BiasAdd/Enter_grad/b_acc_3:0", shape=(240,), dtype=float64)
** Tensor("gradients/model/lstm_model/rnn/while/gru_cell/MatMul_1/Enter_grad/b_acc_3:0", shape=(220, 120), dtype=float64)
** Tensor("gradients/model/lstm_model/rnn/while/gru_cell/BiasAdd_1/Enter_grad/b_acc_3:0", shape=(120,), dtype=float64)
** None
** None
Finally, the weights in the network are printed below:
for v in tf.trainable_variables():
print(v.name)
model/conv1d/kernel:0
model/conv1d/bias:0
model/bn/gamma:0
model/bn/beta:0
model/lstm_model/rnn/gru_cell/gates/kernel:0
model/lstm_model/rnn/gru_cell/gates/bias:0
model/lstm_model/rnn/gru_cell/candidate/kernel:0
model/lstm_model/rnn/gru_cell/candidate/bias:0
model/output/dense/kernel:0
model/output/dense/bias:0
Therefore, how come that the gradients can't be computed wrt to the weights of the first conv, and batch norm, layers in the network?
Please note that I don't have the same problem when replacing new_state
by output
.
Any help is much appreciated!!
from Computing the gradients of new state (of the RNN) with respect to model parameters, (including CNN for inputs), in tensorflow
No comments:
Post a Comment