Wednesday, 17 February 2021

Finding best matched word from large Vocalist

I have a pandas data frame that contains two columns named Potential Word, Fixed Word. The Potential Word column contains words of different languages which contains spell mistakes words and correct words and the Fixed Word column contains the correct words corresponded to Potential Word.

Below I have shared some of the samples data

Potential Word Fixed Word
Exemple Example
pipol People
pimple Pimple
Iunik unique

My vocab Dataframe contains 600K unique row.

My Solution:

key = given_word
glob_match_value = 0
potential_fixed_word = ''
match_threshold = 0.65
for each in df['Potential Word']:
    match_value = match(each, key) # match is a function that returns a 
    # similarity value of two strings
    if match_value > glob_match_value and match_value > match_threshold:
        glob_match_value = match_value
        potential_fixed_word = each

Problem

The problem with my code its takes a lot of time to fix every word because of the loop running through the large vocab list. When a word is missed on the vocab then it takes almost 5 or 6 sec to solve a sentence of 10 ~12 words. The match function performs decently so the objective of the optimization.

I need optimized solution help me here



from Finding best matched word from large Vocalist

No comments:

Post a Comment