Sunday, 9 September 2018

Loop to count values in a pandas df

The code below creates a new column based off the values in others. Specifically, a value in Col['Location'] is selected and assigns a number for every 3 values in Col['Day']. So if 'Home' is selected the first 3 values in Col['Day'] are Mon,Tues,Wed. Therefore, a C1 would go next to them. Thurs,Fri,Sat would be 2 etc.

The problem is selecting one value at a time in Col['Location']. I'd like to perform the same function on the entire column.

import pandas as pd
import numpy as np

d = ({
    'Day' : ['Mon','Tues','Mon','Wed','Thurs','Fri','Mon','Sat','Sun','Tues'],                 
    'Location' : ['Home','Home','Away','Home','Home','Home','Home','Home','Home','Away'],                   
     })

df = pd.DataFrame(data=d)

mask = df['Location'] == 'Home'
df1 = df[mask].drop_duplicates('Day')
d = dict(zip(df1['Day'], np.arange(len(df1)) // 3 + 1))

df.loc[mask, 'Assign'] = df.loc[mask, 'Day'].map(d)

So instead of just selecting Home, I want to do it on everything.

At the moment I'm manually selecting each value, e.g.

mask = df['Location'] == 'Away'

Would a loop be better that performs the code on each value in Col['Location']. Or something that finds each unique value to run it on?

My Intended Output is:

     Day Location Assign
0    Mon     Home     C1
1   Tues     Home     C1
2    Mon     Away     C2
3    Wed     Home     C1
4  Thurs     Home     C3
5    Fri     Home     C3
6    Mon     Home     C1
7    Sat     Home     C3
8    Sun     Home     C4
9   Tues     Away     C2

For reference. As Home is the first value in Col['Location'], the first 3 unique values will be C1. Away is listed next so C2 will go next to the first 3 unique values for Away. Home contains more than 3 values so C3 will go next to Thurs, Fri, Sat. Home goes over 3 again so C4 will go next to Sun in Home

Does this make sense?



from Loop to count values in a pandas df

No comments:

Post a Comment