Saturday, 25 November 2023

More than 6 characters string repeated

I am trying to find the repeated strings (not words) from text.

x = 'This is a sample text and this is lowercase text that is repeated.'

In this example, the string ' text ' should not return because only 6 characters match with one another. But the string 'his is ' is the expected value returned.

I tried using range, Counter and regular expression.

import re
from collections import Counter

duplist = list()
for i in range(1, 30):
  mylist = re.findall('.{1,'+str(i)+'}', x)
  duplist.append([k for k,v in Counter(mylist).items() if v>1])



from More than 6 characters string repeated

No comments:

Post a Comment