Monday, 11 September 2023

Keras Transformers - Dimensions must be equal

I wanted to do NER with keras model using transformers. The example was working correctly but I wanted to add some context to each words in order to help the model being more accurate. What I mean by context is "coordinate X", "coordinate Y", "width of the word", "height of the word", "page index", ... For example some informations are usually on the top right corner of a document so having the coordinate of the word might help (I'm new to ML so feel free to tell me I'm wrong if it's the case).

In order to have this "context" I've transformed the x_train and x_val in this format:

[
    [
        [pageIndex, wordVocabId, x, y, width, height, ocrScore],
        [pageIndex, wordVocabId, x, y, width, height, ocrScore],
        ...
    ],
    [
        [pageIndex, wordVocabId, x, y, width, height, ocrScore],
        [pageIndex, wordVocabId, x, y, width, height, ocrScore],
        ...
    ],
    ...
]

Where each array of 2nd level represent a document and each array of 3nd level represent a word with its context. The 3nd level array is a numpy array of numbers.

Even if I tried to edit the model to make it working I don't think I went in the right direction so I'll post here the model from the example of keras that I try to use and that I would like to adapt to my usecase:

    class TransformerBlock(layers.Layer):
        def __init__(self, embed_dim, num_heads, ff_dim, rate=0.1):
            super().__init__()
            self.att = keras.layers.MultiHeadAttention(
                num_heads=num_heads, key_dim=embed_dim
            )
            self.ffn = keras.Sequential(
                [
                    keras.layers.Dense(ff_dim, activation="relu"),
                    keras.layers.Dense(embed_dim),
                ]
            )
            self.layernorm1 = keras.layers.LayerNormalization(epsilon=1e-6)
            self.layernorm2 = keras.layers.LayerNormalization(epsilon=1e-6)
            self.dropout1 = keras.layers.Dropout(rate)
            self.dropout2 = keras.layers.Dropout(rate)

        def call(self, inputs, training=False):
            attn_output = self.att(inputs, inputs)
            attn_output = self.dropout1(attn_output, training=training)
            out1 = self.layernorm1(inputs + attn_output)
            ffn_output = self.ffn(out1)
            ffn_output = self.dropout2(ffn_output, training=training)
            return self.layernorm2(out1 + ffn_output)
        

    class TokenAndPositionEmbedding(layers.Layer):
        def __init__(self, maxlen, vocab_size, embed_dim):
            super().__init__()
            self.token_emb = keras.layers.Embedding(
                input_dim=vocab_size, output_dim=embed_dim
            )
            self.pos_emb = keras.layers.Embedding(input_dim=maxlen, output_dim=embed_dim)

        def call(self, inputs):
            maxlen = tf.shape(inputs)[-1]
            positions = tf.range(start=0, limit=maxlen, delta=1)
            position_embeddings = self.pos_emb(positions)
            token_embeddings = self.token_emb(inputs)
            return token_embeddings + position_embeddings

    class NERModel(keras.Model):
        def __init__(
            self, num_tags, vocab_size, maxlen=128, embed_dim=32, num_heads=2, ff_dim=32
        ):
            super().__init__()
            self.embedding_layer = TokenAndPositionEmbedding(maxlen, vocab_size, embed_dim)
            self.transformer_block = TransformerBlock(embed_dim, num_heads, ff_dim)
            self.dropout1 = layers.Dropout(0.1)
            self.ff = layers.Dense(ff_dim, activation="relu")
            self.dropout2 = layers.Dropout(0.1)
            self.ff_final = layers.Dense(num_tags, activation="softmax")

        def call(self, inputs, training=False):
            x = self.embedding_layer(inputs)
            x = self.transformer_block(x)
            x = self.dropout1(x, training=training)
            x = self.ff(x)
            x = self.dropout2(x, training=training)
            x = self.ff_final(x)
            return x

I try to compile and fit this way:

    print(len(tag_mapping), vocab_size, len(x_train), len(y_train))
    model = NERModel(len(tag_mapping), vocab_size, embed_dim=32, num_heads=4, ff_dim=64)
    model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    model.fit(tf.convert_to_tensor(x_train), tf.convert_to_tensor(y_train), validation_data=(x_val, y_val), epochs=10)
    model.save("model.keras")

The result of the print is (I have only 3 tags for now because I first try to make the model working):

3 20000 1000 1000

The format of my y_train is the follow:

[
    [tagId_document1_word1, tagId_document1_word2, ...],
    [tagId_document2_Word1, tagId_document2_word1, ...]
]

When I run model.fit I have this error:

 ValueError: Dimensions must be equal, but are 516 and 7 for ' = Equal[T=DT_FLOAT, incompatible_shape_error=true](Cast_1, Cast_2)' with input shapes: [?,516], [?,516,7].

I hope with all these informations someone can pin me in the right direction because I'm a bit lost here.

Thank you.



from Keras Transformers - Dimensions must be equal

No comments:

Post a Comment