Sunday 31 January 2021

How can I iterate until all entries are in a given column?

I am trying to apply a while statement to my code in order to run it until all the elements in the lists below (in the column Check) are in column Source.

My code is as so far:

while set_condition: # to set the condition
     newCol = pd.Series(list(set(df['Check']) - set(df['Source']))) # this check for elements which are not currently included in the column Source
     newList1 = newCol.apply(lambda x: my_function(x)) # this function should generate the lists n Check -> this explains why I need to create a while statement
     df = df.append(pd.DataFrame(dict('Source'=newCol, 'Check'=newList1)), ignore_index=True) # append the results in the new column
     df = df.explode('Check')

I will give you an example of the process and of how my_function works: let's say that I have my initial dataset

Source       Check
mouse   [dog, horse, cat]   
horse   [mouse, elephant]   
tiger   []  
elephant [horse, bird]

After exploding Check column and appending the results to Source, I will have

Source       Check
mouse   [dog, horse, cat]   
horse   [mouse, elephant]   
tiger   []  
elephant [horse, bird]
dog     [] # this will be filled in after applying the function
cat     [] # this will be filled in after applying the function
bird    [] # this will be filled in after applying the function

Every elements in the lists should be added in Source column before applying the function. When I apply the function, I populate the lists of the other elements; so, for example I can have

Source       Check
mouse   [dog, horse, cat]   
horse   [mouse, elephant]   
tiger   []  
elephant [horse, bird]
dog     [mouse, fish]  # they are filled in
cat     [mouse]
bird    [elephant, penguin]
fish    [dog]

Since fish and penguin are not in Source, I will need to run again the code in order to have the expected output (all the elements in the lists are already in the Source column):

Source       Check
mouse   [dog, horse, cat]   
horse   [mouse, elephant]   
tiger   []  
elephant [horse, bird]
dog     [mouse, fish] 
cat     [mouse]
bird    [elephant, penguin]
fish    [dog]
penguin [bird]

as both dog and bird are already in Source, I will not need to apply again the function as all the lists are populated with elements already in the Source column. The code can stop to run.

I cannot provide the code for my_function, but I hope it can be clear how it works, in order to try to figure out how to set the while statement.

What I would like to do is to stop the cycle/loop when all the elements in the lists are in the column Source and have applied the function to populate all the lists.

Thank you for all the help you will provide.



from How can I iterate until all entries are in a given column?

No comments:

Post a Comment