I was given this code that quickly returns whether the Levenshtein distance between two strings is exactly 2.
def li(s, i):
try:
return s[i]
except IndexError:
return None
def f(str1, str2):
t = [4, 4, 1, 2, 3]
for i, str1_symb in enumerate(str1):
p = 4
res = []
for j, t_val in enumerate(t):
p = min(t_val - (str1_symb == li(str2, i + j - 2)), p, li(t, j + 1) or 4) + 1
res.append(p)
t = res
return li(t, len(str2) - len(str1) + 2) == 3
You can test it with:
f("zzzzfood", "zzzzdodod")
for example which will return True
and
f("zzzzfood", "zzzzdodo")
which will return False.
The standard algorithm for computing the Levenshtein distance builds a dynamic programming table and fills in the elements from left to right, top to bottom using the formula:
(from the wiki page linked above)
If you only want to return if the Levenshtein distance is at most 2 you can only look at cells of the dynamic programming that are at most 2 right or left from the diagonal.
The code above doesn't obviously do that and I can't work out what it is doing. Some particularly mysterious parts:
- What is the role of t = [4, 4, 1, 2, 3]?
- The li() function is taking both a string and a list in this code. It only returns None if the index
i
is greater than or equal tolen(s)
. Sometimesi
will be negative where it will still return a letter froms
. li(t, j + 1) or 4
returns 4 ifli(t, j + 1)
isNone
but I don't know what its purpose is.- What is the purpose/meaning of
p
?
Can anyone decipher it?
from Can anyone explain this code for computing Levenshtein distance?
No comments:
Post a Comment