I am trying to modify Keras code that is meant to be for a sequence to sequence Transformers and keep only the encoding part and skipping the decoder. What actually am trying to do is to in line 414 of file to modify the following lines:
self.encoder = SelfAttention(d_model, d_inner_hid, n_head, layers, dropout)
#self.decoder = Decoder(d_model, d_inner_hid, n_head, layers, dropout)
self.target_layer = TimeDistributed(Dense(o_tokens.num(), use_bias=False))
And in lines 434 of the same file:
enc_output = self.encoder(src_emb, src_seq, active_layers=active_layers)
#dec_output = self.decoder(tgt_emb, tgt_seq, src_seq, enc_output, active_layers=active_layers)
final_output = self.target_layer(enc_output)
The input to the final_output should be the output of the encoder. When I am trying to run the code like that I am facing the following error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [] [Condition x == y did not hold element-wise:] [x (SparseSoftmaxCrossEntropyWithLogits_1/Shape_1:0) = ] [32 114] [y (SparseSoftmaxCrossEntropyWithLogits_1/strided_slice:0) = ] [32 115] [[ = Assert[T=[DT_STRING, DT_STRING, DT_STRING, DT_INT32, DT_STRING, DT_INT32], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](SparseSoftmaxCrossEntropyWithLogits_1/assert_equal/All, SparseSoftmaxCrossEntropyWithLogits_1/assert_equal/Assert/Assert/data_0, SparseSoftmaxCrossEntropyWithLogits_1/assert_equal/Assert/Assert/data_1, SparseSoftmaxCrossEntropyWithLogits_1/assert_equal/Assert/Assert/data_2, SparseSoftmaxCrossEntropyWithLogits_1/Shape, SparseSoftmaxCrossEntropyWithLogits_1/assert_equal/Assert/Assert/data_4, SparseSoftmaxCrossEntropyWithLogits_1/strided_slice)]]
The error is produced when I am trying to fit the data to my model in pinyin_main.py in line 34:
s2s.model.fit_generator(gen, steps_per_epoc = 2000, epochs = 5, callbacks = [lr_scheduler, model_saver])
Furthermore, I have commented out the try-catch lines of code 25-26 in file pinyin_main.py where they load the weights to the file.
I am sure that I am missing a lot of things, but I am very puzzled in the way that the model is functioning. Is anyone familiar with the Transformers approach that can contribute to the post?
EDIT:
I guess I need to modify the compile function in transformer.py. For example the lines 419-420:
src_seq_input = Input(shape=(None,), dtype='int32')
tgt_seq_input = Input(shape=(None,), dtype='int32')
the src_seq_input could represent the input tensor and tgt_seq_input the tensor. Then the output of the encoder enc_output will replace the target_output. With that modification and given as input the data of the example the training of model proceeds. However, my question is, does it make sense?
enc_output = self.encoder(src_emb, src_seq, active_layers=active_layers)
#dec_output = self.decoder(tgt_emb, tgt_seq, src_seq, enc_output, active_layers=active_layers)
#final_output = self.target_layer(enc_output)
def get_loss(y_pred, y_true):
...
def get_accu(y_pred, y_true):
...
loss = get_loss(enc_output, tgt_true)
self.ppl = K.exp(loss)
self.accu = get_accu(enc_output, tgt_true)
self.model = Model([src_seq_input, tgt_seq_input], enc_output)
self.model.add_loss([loss])
What I figured out is that input and output should always have the same size, therefore, am trying to figure out how can I change that. I am kinda of struggling to understand how the size of the gen it is passing to the SelfAttention model in Transformers model.
from Transformers encoder in Keras
No comments:
Post a Comment