I'm training an LSTM network with Tensorflow in Python and wanted to switch to tf.contrib.cudnn_rnn.CudnnLSTM for faster training. What I did is replaced
cells = tf.nn.rnn_cell.LSTMCell(self.num_hidden)
initial_state = cells.zero_state(self.batch_size, tf.float32)
rnn_outputs, _ = tf.nn.dynamic_rnn(cells, my_inputs, initial_state = initial_state)
with
lstm = tf.contrib.cudnn_rnn.CudnnLSTM(1, self.num_hidden)
rnn_outputs, _ = lstm(my_inputs)
I'm experiencing significant training speedup (more than 10x times), but at the same time my performance metric goes down. AUC on a binary classification is 0.741 when using LSTMCell and 0.705 when using CudnnLSTM. I'm wondering if I'm doing something wrong or it's the difference in implementation between those two and it's that's the case how to get my performance back while keep using CudnnLSTM.
The training dataset has 15,337 sequences of varying length (up to few hundred elements) that are padded with zeros to be the same length in each batch. All the code is the same including the TF Dataset API pipeline and all evaluation metrics. I ran each version few times and in all cases it converges around those values.
Moreover, I have few datasets that can be plugged into exactly the same model and the problem persists on all of them.
In the tensorflow code for cudnn_rnn I found a sentence saying:
Cudnn LSTM and GRU are mathematically different from their tf counterparts.
But there's no explanation what those differences really are...
from Different results while training with CudnnLSTM compared to regular LSTMCell in Tensorflow
No comments:
Post a Comment