Monday, 5 July 2021

Reassign values in pandas df grouped by time

I have a pandas df below, where the columns are ordered by Value. So -1 will always be first, followed by 0, then 1, then 2 etc.

I want to reassign the Value integers based on the max number of counts for each integer. Specifically, I want to leave -1 as is. For all other numbers, I want 0 to be the most represented for each Period. Then make all other integers a 1.

So if 0 has the most integers for a unique Period, then it's fine. If another integer has a higher count, then re-assign those values as 0 and swap the 0's to 1.

If we leave -1 alone, Period 1 has two 0's, two 1's, and two 2's. So change the 2's to a 1.

Period 2 has two 0's and two 1's so leave as is.

d = {'Item': ["Red", "Blue", "Green", "White", "Black", "Orange", "Yellow", "Brown", "Red", "Blue", "Green", "White", "Black", "Orange", "Yellow", "Brown"],
     'Value': [-1, -1, 0, 0, 1, 1, 2, 2, -1, -1, -1, -1, 0, 0, 1, 1],
     'Period': [1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2],
    }

df = pd.DataFrame(data=d)

df['edge']=df.groupby('Period')['Value'].transform(lambda x:x.value_counts().diff().fillna(0).eq(0).all())
df['newval']=df.groupby('Period')['Value'].transform(lambda x:x.value_counts().idxmax())
m1=(df['newval'].ne(0) & df['newval'].eq(df['Value'])) & df['edge'].eq(False)
m2=(df['newval'].ne(0) & df['Value'].eq(0)) & df['edge'].eq(False)
df.loc[m1,'Value']=0
df.loc[m2,'Value']=1
df=df.drop(['newval','edge'],1)

df:

      Item  Value  Period
0      Red     -1       1
1     Blue     -1       1
2    Green      0       1
3    White      0       1
4    Black      1       1
5   Orange      1       1
6   Yellow      2       1
7    Brown      2       1
8      Red     -1       2
9     Blue     -1       2
10   Green     -1       2
11   White     -1       2
12   Black      0       2
13  Orange      0       2
14  Yellow      1       2
15   Brown      1       2

intended output:

      Item  Value  Period
0      Red     -1       1
1     Blue     -1       1
2    Green      0       1
3    White      0       1
4    Black      1       1
5   Orange      1       1
6   Yellow      1       1
7    Brown      1       1
8      Red     -1       2
9     Blue     -1       2
10   Green      0       2
11   White      0       2
12   Black      1       2
13  Orange      1       2
14  Yellow      1       2
15   Brown      1       2


from Reassign values in pandas df grouped by time

No comments:

Post a Comment