Friday, 29 April 2022

How to get all fuzzy matching substrings between two strings in python?

Say I have three example strings

text1 = "Patient has checked in for abdominal pain which started 3 days ago. Patient was prescribed idx 20 mg every 4 hours."
text2 = "The time of discomfort was 3 days ago."
text3 = "John was given a prescription of idx, 20mg to be given every four hours"

If I got all the matching substrings of text2 and text3 with text1, I would get

text1_text2_common = [
    '3 days ago.',
]

text2_text3_common = [
    'of',
]

text1_text3_common = [
    'was',
    'idx'
    'every'
    'hours'
]

What I am looking for is a fuzzy matching, using something like the Levenshtein distance [ https://ift.tt/5kYXErf ]. So even if the substrings are not exact, if they are similar enough for a criteria, it would get selected as a substring.

So ideally I am looking for something like

text1_text3_common_fuzzy = [
    'prescription of idx, 20mg to be given every four hours'
]


from How to get all fuzzy matching substrings between two strings in python?

No comments:

Post a Comment