Monday, 29 July 2019

Keras Multitask learning with two different input sample size

I am implementing multitask regression model using code from the Keras API under the shared layers section.

There are two data sets, Let's call them data_1 and data_2 as follows.

data_1 : shape(1434, 185, 37)
data_2 : shape(283, 185, 37)

data_1 is consists of 1434 samples, each sample is 185 characters long and 37 shows total number of unique characters is 37 or in another words the vocab_size. Comparatively data_2 consists of 283 characters.

I convert the data_1 and data_2 into two dimensional numpy array as follows before giving it to the Embedding layer.

data_1=np.argmax(data_1, axis=2)
data_2=np.argmax(data_2, axis=2)

That makes the shape of the data as follows.

print(np.shape(data_1)) 
(1434, 185)

print(np.shape(data_2)) 
(283, 185)

Each number in the matrix represents index integer.

The multitask model is as under.

user_input = keras.layers.Input(shape=((185, )), name='Input_1')
products_input =  keras.layers.Input(shape=((185, )), name='Input_2')

shared_embed=(keras.layers.Embedding(vocab_size, 50, input_length=185))

user_vec_1 = shared_embed(user_input )
user_vec_2 = shared_embed(products_input )


input_vecs = keras.layers.concatenate([user_vec_1, user_vec_2], name='concat')

input_vecs_1=keras.layers.Flatten()(input_vecs)
input_vecs_2=keras.layers.Flatten()(input_vecs)

# Task 1 FC layers
nn = keras.layers.Dense(90, activation='relu',name='layer_1')(input_vecs_1)
result_a = keras.layers.Dense(1, activation='linear', name='output_1')(nn)



# Task 2 FC layers
nn1 = keras.layers.Dense(90, activation='relu', name='layer_2')(input_vecs_2)
result_b = keras.layers.Dense(1, activation='linear',name='output_2')(nn1) 

model = Model(inputs=[user_input , products_input], outputs=[result_a, result_b])

model.compile(optimizer='rmsprop',
              loss='mse',
              metrics=['accuracy'])

The model is visualized as follows. enter image description here

Then I fit the model as follows.

model.fit([data_1, data_2], [Y_1,Y_2], epochs=10)

Error:

ValueError: All input arrays (x) should have the same number of samples. Got array shapes: [(1434, 185), (283, 185)]

Is there any way in Keras where I can use two different sample size inputs or to some trick to avoid this error to achieve my goal of multitasking regression.

Here is the minimum working code for testing.

data_1=np.array([[25,  5, 11, 24,  6],
       [25,  5, 11, 24,  6],
       [25,  0, 11, 24,  6],
       [25, 11, 28, 11, 24],
       [25, 11,  6, 11, 11]])

data_2=np.array([[25, 11, 31,  6, 11],
       [25, 11, 28, 11, 31],
       [25, 11, 11, 11, 31]])

Y_1=np.array([[2.33],
       [2.59],
       [2.59],
       [2.54],
       [4.06]])


Y_2=np.array([[2.9],
       [2.54],
       [4.06]])



user_input = keras.layers.Input(shape=((5, )), name='Input_1')
products_input =  keras.layers.Input(shape=((5, )), name='Input_2')

shared_embed=(keras.layers.Embedding(37, 3, input_length=5))
user_vec_1 = shared_embed(user_input )
user_vec_2 = shared_embed(products_input )

input_vecs = keras.layers.concatenate([user_vec_1, user_vec_2], name='concat')


input_vecs_1=keras.layers.Flatten()(input_vecs) 
input_vecs_2=keras.layers.Flatten()(input_vecs)

    nn = keras.layers.Dense(90, activation='relu',name='layer_1')(input_vecs_1)
    result_a = keras.layers.Dense(1, activation='linear', name='output_1')(nn)

    # Task 2 FC layers
    nn1 = keras.layers.Dense(90, activation='relu', name='layer_2')(input_vecs_2)

    result_b = keras.layers.Dense(1, activation='linear',name='output_2')(nn1)

model = Model(inputs=[user_input , products_input], outputs=[result_a, result_b])

model.compile(optimizer='rmsprop',
              loss='mse',
              metrics=['accuracy'])

model.fit([data_1, data_2], [Y_1,Y_2], epochs=10)



from Keras Multitask learning with two different input sample size

No comments:

Post a Comment