Tuesday, 8 September 2020

Use wmd function of gensim for sentence clustering

I have a list of sentences. I want to cluster my sentences on similarity using the WMD (word mover's distance). I am using a word2vec model from gensim to create embeddings for my words.

The clustering algorithms I know (nltk, sklearn) use number vectors as input so I need to give the sentences as an array (or list) of the embeddings of the words in them. I think I can use the nltk clustering methods with a custom distance function. I want to use the WMD as his custom function. But the WMD function of gensim uses a 2 lists of strings as input.

Is there a prebuild WMD function that uses the embeddings and not the strings as input? Or is there a clustering (kmeans or something else) that can handle lists of strings as input and can have the WMD as custom distance function?

Thanks



from Use wmd function of gensim for sentence clustering

No comments:

Post a Comment