I am implementing multitask regression model using code from the Keras API under the shared layers section.
There are two data sets, Let's call them data_1
and data_2
as follows.
data_1 : shape(1434, 185, 37)
data_2 : shape(283, 185, 37)
data_1
is consists of 1434 samples, each sample is 185 characters long and 37 shows total number of unique characters is 37 or in another words the vocab_size
. Comparatively data_2
consists of 283 characters.
I convert the data_1
and data_2
into two dimensional numpy array as follows before giving it to the Embedding layer.
data_1=np.argmax(data_1, axis=2)
data_2=np.argmax(data_2, axis=2)
That makes the shape of the data as follows.
print(np.shape(data_1))
(1434, 185)
print(np.shape(data_2))
(283, 185)
Each number in the matrix represents index integer.
The multitask model is as under.
user_input = keras.layers.Input(shape=((185, )), name='Input_1')
products_input = keras.layers.Input(shape=((185, )), name='Input_2')
shared_embed=(keras.layers.Embedding(vocab_size, 50, input_length=185))
user_vec_1 = shared_embed(user_input )
user_vec_2 = shared_embed(products_input )
input_vecs = keras.layers.concatenate([user_vec_1, user_vec_2], name='concat')
input_vecs_1=keras.layers.Flatten()(input_vecs)
input_vecs_2=keras.layers.Flatten()(input_vecs)
# Task 1 FC layers
nn = keras.layers.Dense(90, activation='relu',name='layer_1')(input_vecs_1)
result_a = keras.layers.Dense(1, activation='linear', name='output_1')(nn)
# Task 2 FC layers
nn1 = keras.layers.Dense(90, activation='relu', name='layer_2')(input_vecs_2)
result_b = keras.layers.Dense(1, activation='linear',name='output_2')(nn1)
model = Model(inputs=[user_input , products_input], outputs=[result_a, result_b])
model.compile(optimizer='rmsprop',
loss='mse',
metrics=['accuracy'])
The model is visualized as follows.
Then I fit the model as follows.
model.fit([data_1, data_2], [Y_1,Y_2], epochs=10)
Error:
ValueError: All input arrays (x) should have the same number of samples. Got array shapes: [(1434, 185), (283, 185)]
Is there any way in Keras where I can use two different sample size inputs or to some trick to avoid this error to achieve my goal of multitasking regression.
Here is the minimum working code for testing.
data_1=np.array([[25, 5, 11, 24, 6],
[25, 5, 11, 24, 6],
[25, 0, 11, 24, 6],
[25, 11, 28, 11, 24],
[25, 11, 6, 11, 11]])
data_2=np.array([[25, 11, 31, 6, 11],
[25, 11, 28, 11, 31],
[25, 11, 11, 11, 31]])
Y_1=np.array([[2.33],
[2.59],
[2.59],
[2.54],
[4.06]])
Y_2=np.array([[2.9],
[2.54],
[4.06]])
user_input = keras.layers.Input(shape=((5, )), name='Input_1')
products_input = keras.layers.Input(shape=((5, )), name='Input_2')
shared_embed=(keras.layers.Embedding(37, 3, input_length=5))
user_vec_1 = shared_embed(user_input )
user_vec_2 = shared_embed(products_input )
input_vecs = keras.layers.concatenate([user_vec_1, user_vec_2], name='concat')
input_vecs_1=keras.layers.Flatten()(input_vecs)
input_vecs_2=keras.layers.Flatten()(input_vecs)
nn = keras.layers.Dense(90, activation='relu',name='layer_1')(input_vecs_1)
result_a = keras.layers.Dense(1, activation='linear', name='output_1')(nn)
# Task 2 FC layers
nn1 = keras.layers.Dense(90, activation='relu', name='layer_2')(input_vecs_2)
result_b = keras.layers.Dense(1, activation='linear',name='output_2')(nn1)
model = Model(inputs=[user_input , products_input], outputs=[result_a, result_b])
model.compile(optimizer='rmsprop',
loss='mse',
metrics=['accuracy'])
model.fit([data_1, data_2], [Y_1,Y_2], epochs=10)
from Keras Multitask learning with two different input sample size
No comments:
Post a Comment