The code below creates a new column based off the values in others. Specifically, a value in Col['Location'] is selected and assigns a number for every 3 values in Col['Day']. So if 'Home' is selected the first 3 values in Col['Day'] are Mon,Tues,Wed. Therefore, a C1 would go next to them. Thurs,Fri,Sat would be 2 etc.
The problem is selecting one value at a time in Col['Location']. I'd like to perform the same function on the entire column.
import pandas as pd
import numpy as np
d = ({
'Day' : ['Mon','Tues','Mon','Wed','Thurs','Fri','Mon','Sat','Sun','Tues'],
'Location' : ['Home','Home','Away','Home','Home','Home','Home','Home','Home','Away'],
})
df = pd.DataFrame(data=d)
mask = df['Location'] == 'Home'
df1 = df[mask].drop_duplicates('Day')
d = dict(zip(df1['Day'], np.arange(len(df1)) // 3 + 1))
df.loc[mask, 'Assign'] = df.loc[mask, 'Day'].map(d)
So instead of just selecting Home, I want to do it on everything.
At the moment I'm manually selecting each value, e.g.
mask = df['Location'] == 'Away'
Would a loop be better that performs the code on each value in Col['Location']. Or something that finds each unique value to run it on?
My Intended Output is:
Day Location Assign
0 Mon Home C1
1 Tues Home C1
2 Mon Away C2
3 Wed Home C1
4 Thurs Home C3
5 Fri Home C3
6 Mon Home C1
7 Sat Home C3
8 Sun Home C4
9 Tues Away C2
For reference. As Home is the first value in Col['Location'], the first 3 unique values will be C1. Away is listed next so C2 will go next to the first 3 unique values for Away. Home contains more than 3 values so C3 will go next to Thurs, Fri, Sat. Home goes over 3 again so C4 will go next to Sun in Home
Does this make sense?
from Loop to count values in a pandas df
No comments:
Post a Comment