Given an input string, I would like to find the set of all other strings that are within Levenshtein distance 2. For example, if the input string is "aaa" and the alphabet is ['a', 'b'] then we have:
{'baaa', 'aab', 'a', 'aaaaa', 'baab', 'abbaa', 'abaa', 'aaabb', 'abb', 'aaab', 'ababa', 'aa', 'aabb', 'baba', 'baaab', 'aabab', 'aaaab', 'abaaa', 'aabaa', 'bbaaa', 'abaab', 'aaaa', 'baaaa', 'bab', 'bba', 'aba', 'aaaba', 'ba', 'aabba', 'abab', 'baa', 'aaa', 'bbaa', 'baaba', 'aaba', 'abba', 'ab', 'babaa'}
I have code to do this but it is inefficient. Here it is using all printable ascii characters as the alphabet and the input string aaaaaaaaaa.
import string
input_string = "a" * 10
f = (
lambda input_string, dist=2, i=0: dist * input_string[i - 1 :]
and {
k[:i] + char + k[i + w:]
for k in f(input_string, dist - 1)
for char in [""] + list(string.printable)
for w in (0, 1)
}
| f(input_string, dist, i + 1)
or {input_string}
)
f(input_string)
I need to run this many times with different input strings. This code takes 2.9s currently on my desktop to make the 1631129 different strings. Can anyone see how to make it much faster?
from Can this code to find the neighborhood of a string be sped up?
No comments:
Post a Comment