Hemant Vishwakarma: How to get all fuzzy matching substrings between two strings in python?

Friday, 29 April 2022

How to get all fuzzy matching substrings between two strings in python?

Say I have three example strings

text1 = "Patient has checked in for abdominal pain which started 3 days ago. Patient was prescribed idx 20 mg every 4 hours."
text2 = "The time of discomfort was 3 days ago."
text3 = "John was given a prescription of idx, 20mg to be given every four hours"

If I got all the matching substrings of text2 and text3 with text1, I would get

text1_text2_common = [
    '3 days ago.',
]

text2_text3_common = [
    'of',
]

text1_text3_common = [
    'was',
    'idx'
    'every'
    'hours'
]

What I am looking for is a fuzzy matching, using something like the Levenshtein distance [ https://ift.tt/5kYXErf ]. So even if the substrings are not exact, if they are similar enough for a criteria, it would get selected as a substring.

So ideally I am looking for something like

text1_text3_common_fuzzy = [
    'prescription of idx, 20mg to be given every four hours'
]

from How to get all fuzzy matching substrings between two strings in python?

Hemant Vishwakarma

Friday, 29 April 2022

How to get all fuzzy matching substrings between two strings in python?

No comments:

Post a Comment