Wednesday 29 June 2022

Custom input and output for transformer model

Using the Transformer architecture from Attention Is All You Need and it's implementation from Transformer model for language understanding, I want to change the model to accept as input my one dimensional feature array using a Dense layer with 30 units (for encoder), and classify into 6 classes as onehot using Dense with 6 units (for decoder).

The first thing I tried to change is Encoder class :

class Encoder(tf.keras.layers.Layer):
  def __init__(self,*, num_layers, d_model, num_heads, dff, input_size,
    super(Encoder, self).__init__()

    self.d_model = d_model
    self.num_layers = num_layers
    # self.input_layer = tf.keras.layers.Input(shape=(None, input_size))
    self.first_hidden_layer = tf.keras.layers.Dense(d_model, activation='relu', input_shape=(input_size,))
    self.pos_encoding = positional_encoding(input_size, self.d_model)

    self.enc_layers = [
        EncoderLayer(d_model=d_model, num_heads=num_heads, dff=dff, rate=rate)
        for _ in range(num_layers)]

    self.dropout = tf.keras.layers.Dropout(rate)

  def call(self, x, training, mask):

    seq_len = tf.shape(x)[1]
    x = tf.reshape(x, [64, 29])
    # x = self.input_layer(x)
    x = self.first_hidden_layer(x)
    x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
    x += self.pos_encoding[:, :seq_len, :]
    x = self.dropout(x, training=training)

    for i in range(self.num_layers):
      x = self.enc_layers[i](x, training, mask)

    return x  # (batch_size, input_seq_len, d_model)


transformer = Transformer(num_layers=num_layers,d_model=d_model,num_heads=num_heads,dff=dff,input_size=input_size,target_size=target_size,rate=dropout_rate)

but I get many errors on different attempts to add a dense layer to the encoder's input:

ValueError: The last dimension of the inputs to `Dense` should be defined. Found `None`
ValueError: slice index 1 of dimension 0 out of bounds.
'Tensor' object is not callable

I know this model is adapted for natural language task like translating, but my current level of knowledge is very low on this task and it will take a lot of time to learn every details. I just need to test one of my assumption quickly to move forward with others. If you know how to adapt this model with my custom input shape (30, ) and output shape (6, ) I would appreciate a lot.

