I have a (very large) list similar to:
a = ['A', 'B', 'A', 'B', 'A', 'C', 'D', 'E', 'D', 'E', 'D', 'F', 'G', 'A', 'B']
and I want to extract from it a list of lists like:
result = [['A', 'B', 'A', 'B', 'A'], ['D', 'E', 'D', 'E', 'D']]
The repeating patterns can be different, for example there can also be intervals such as:
['A', 'B', 'C', 'A', 'D', 'E', 'A'] (with a 'jump' over two elements)
I have written a very simple code that seems to work:
tolerance = 2
counter = 0
start, stop = 0, 0
for idx in range(len(a) - 1):
if a[idx] == a[idx+1] and counter == 0:
start = idx
counter += 1
elif a[idx] == a[idx+1] and counter != 0:
if tolerance <= 0:
stop = idx
tolerance = 2
elif a[idx] != a[idx+1]:
tolerance -= 1
if start != 0 and stop != 0:
result = [a[start::stop]]
But 1) it is very cumbersome, 2) I need to apply this to very large lists, so is there a more concise and overall better way of writing it?
EDIT: As @Kasramvd correctly pointed out, I need the largest set that satisfies the requirement of (at most a tolerance number of jumps between the start and end elements), so I take:
['A', 'B', 'A', 'B', 'A'] instead of [ 'B', 'A', 'B' ]
because the former includes the latter.
Also it would be good if the code can select elements UP TO the certain tolerance, for example if the tolerance (maximum number of elements not equal to the start or end element) is 2, it should also return sets as:
['A', 'A', 'A', 'B', 'A', 'B', 'A', 'C', 'D', 'A']
with tolerances 0, 1 and 2.
from Select sublists from python list, beginning and ending on the same element
No comments:
Post a Comment