I am trying to find the repeated strings (not words) from text.
x = 'This is a sample text and this is lowercase text that is repeated.'
In this example, the string ' text ' should not return because only 6 characters match with one another. But the string 'his is ' is the expected value returned.
I tried using range, Counter and regular expression.
import re
from collections import Counter
duplist = list()
for i in range(1, 30):
mylist = re.findall('.{1,'+str(i)+'}', x)
duplist.append([k for k,v in Counter(mylist).items() if v>1])
from More than 6 characters string repeated
No comments:
Post a Comment