I currently have a list of directories that looks like that:
total_list = [{'email': 'usera@email.com',
'id': 1,
'country': 'UK',
},
{'email': 'userb@email.com',
'id': 2,
'country': 'UK',
},
{'email': 'usera@email.com',
'id': 1,
'country': 'Germany',
},
{'email': 'userc@email.com',
'id': 3,
'country': 'Italy',
},
{'email': 'userd@email.com',
'id': 4,
'country': 'France',
},
{'email': 'userc@email.com',
'id': 3,
'country': 'Netherland',
},....
]
I want to split it primarily based on size, so let's say that the new size list is 3 items per list, But I also want to make sure that all the same users will be at the same new sublist.
So the result I am trying to create is:
list_a = [{'email': 'usera@email.com',
'id': 1,
'country': 'UK',
},
{'email': 'userb@email.com',
'id': 2,
'country': 'UK',
},
{'email': 'usera@email.com',
'id': 1,
'country': 'Germany',
},
]
list_b = [{'email': 'userc@email.com',
'id': 3,
'country': 'Italy',
},
{'email': 'userd@email.com',
'id': 4,
'country': 'France',
},
{'email': 'userc@email.com',
'id': 3,
'country': 'Netherland',
},....
]
Obviously at the example that I provided the users were located really close to each other in the list, but in reality they could be spread way more. I was considering sorting the list based on the email and then splitting them, but I am not sure what happens if the items that are supposed to be grouped together happen to be at the exact location that the main list will be divided.
What I have tried so far is:
def list_splitter(main_list, size):
for i in range(0, len(main_list), size):
yield main_list[i:i + size]
# calculating the needed number of sublists
max_per_batch = 3
number_of_sublists = ceil(len(total_list) / max_per_batch)
# sort the data by email
total_list.sort(key=lambda x: x['email'])
sublists = list(list_splitter(main_list=total_list, size=max_per_batch))
The issue is that with this logic I cannot 100% ensure that if there are any items with the same email value they will end up at the same sublist.Because of the sorting, chances are that this will happen, but it is not certain. Basically I need a method to make sure that items with the same email will always be at the same sublist, but the main condition of the split is the sublist size.
Any ideas?
from Split list of dictionaries in separate lists based primarily on list size but secondarily based on condition
No comments:
Post a Comment