I have a pandas df below, where the columns are ordered by Value
. So -1
will always be first, followed by 0
, then 1
, then 2
etc.
I want to reassign the Value
integers based on the max number of counts for each integer. Specifically, I want to leave -1
as is. For all other numbers, I want 0
to be the most represented for each Period
. Then make all other integers a 1
.
So if 0
has the most integers for a unique Period
, then it's fine. If another integer has a higher count, then re-assign those values as 0 and swap the 0's
to 1
.
If we leave -1
alone, Period 1
has two 0's
, two 1's
, and two 2's
. So change the 2's
to a 1
.
Period 2
has two 0's
and two 1's
so leave as is.
d = {'Item': ["Red", "Blue", "Green", "White", "Black", "Orange", "Yellow", "Brown", "Red", "Blue", "Green", "White", "Black", "Orange", "Yellow", "Brown"],
'Value': [-1, -1, 0, 0, 1, 1, 2, 2, -1, -1, -1, -1, 0, 0, 1, 1],
'Period': [1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2],
}
df = pd.DataFrame(data=d)
df['edge']=df.groupby('Period')['Value'].transform(lambda x:x.value_counts().diff().fillna(0).eq(0).all())
df['newval']=df.groupby('Period')['Value'].transform(lambda x:x.value_counts().idxmax())
m1=(df['newval'].ne(0) & df['newval'].eq(df['Value'])) & df['edge'].eq(False)
m2=(df['newval'].ne(0) & df['Value'].eq(0)) & df['edge'].eq(False)
df.loc[m1,'Value']=0
df.loc[m2,'Value']=1
df=df.drop(['newval','edge'],1)
df:
Item Value Period
0 Red -1 1
1 Blue -1 1
2 Green 0 1
3 White 0 1
4 Black 1 1
5 Orange 1 1
6 Yellow 2 1
7 Brown 2 1
8 Red -1 2
9 Blue -1 2
10 Green -1 2
11 White -1 2
12 Black 0 2
13 Orange 0 2
14 Yellow 1 2
15 Brown 1 2
intended output:
Item Value Period
0 Red -1 1
1 Blue -1 1
2 Green 0 1
3 White 0 1
4 Black 1 1
5 Orange 1 1
6 Yellow 1 1
7 Brown 1 1
8 Red -1 2
9 Blue -1 2
10 Green 0 2
11 White 0 2
12 Black 1 2
13 Orange 1 2
14 Yellow 1 2
15 Brown 1 2
from Reassign values in pandas df grouped by time
No comments:
Post a Comment