Saturday, 15 May 2021

How to set multiple sequences as features in KERAS

I want to make Named Entity Recognition model with Keras. These are the links that I have followed:

https://valueml.com/named-entity-recognition-using-lstm-in-keras/ https://djajafer.medium.com/named-entity-recognition-and-classification-with-keras-4db04e22503d

Data looks like this:

                word label
0          Thousands     O
1                 of     O
2      demonstrators     O
3               have     O
4            marched     O
...              ...   ...
44187          there     O
44188   accidentally     O
44189             or     O
44190   deliberately     O
44191              .     O

They are using word to vectors, so they are indexing the words and labels, so that X are my features (index sequences of words) and y are my results (index sequences of labels):

max_len = 30
X = [[word2idx[w[0]] for w in s] for s in list_of_sentances]
X = pad_sequences(maxlen=max_len, sequences=X, padding="post", value=num_words-1)

y = [[label2idx[w[1]] for w in s] for s in list_of_sentances]
y = pad_sequences(maxlen=max_len, sequences=y, padding="post", value=label2idx["O"])
y = [to_categorical(i, num_classes=num_labels) for i in y]

But what if I have dataset like this: enter image description here

here I have another column and that is POS. How can I add values of POS column to my features? So basically, I do not want only word values in my X, i also want POS values in my X. *(or any other values) What If I have multiple columns, such as:

word
POS
is_capital_letter
word_length

...

How can I add all of these columns to my features



from How to set multiple sequences as features in KERAS

No comments:

Post a Comment