I've got a df below with numerous duplicate values. Using below, I'm aiming to drop rows where Value is unique compared to the previous rows and Group is equal to C.
Further, where this occurs I want to remove all previous duplicate rows.
d = {'Item': ["Red", "Red", "Red", "Green", "Green", "Red", "Red", "Red", "Green", "Green", "Green", "Green", "Red", "Red", "Red", "Green"],
'Value': [1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 6],
'Group': ["A", "B", "B", "C", "D", "D", "A", "B", "C", "D", "E", "E", "B", "B", "D", "D"],
}
df = pd.DataFrame(data=d)
mask = (df['Item'].isin(['Green'])) & (df.Value.eq(df.Value.shift(-1)))
df = df[~mask]
out:
Item Value Group
0 Red 1 A
1 Red 1 B
2 Red 1 B
4 Green 2 D
5 Red 3 D
6 Red 3 A
7 Red 3 B
11 Green 4 E
12 Red 5 B
13 Red 5 B
14 Red 5 D
15 Green 6 D
intended output:
Item Value Group
0 Red 1 A
4 Green 2 D
5 Red 3 D
6 Red 3 A
9 Green 4 D
10 Green 4 E
11 Green 4 E
12 Red 5 B
13 Red 5 B
14 Red 5 D
15 Green 6 D
@Anurag
out=df.shift(-1)
cond=df['Value'].ne(out['Value']) & out['Group'].eq('C')
index=list(flatten((df[cond].index+1).map(range)))
to_drop=df.loc[index]
to_drop=to_drop[to_drop.duplicated(['Item','Value'])].index.tolist()+(df[cond].index+1).tolist()
df=df.drop(to_drop)
out:
Item Value Group
5 Red 3 D
9 Green 4 D
10 Green 4 E
11 Green 4 E
12 Red 5 B
13 Red 5 B
14 Red 5 D
15 Green 6 D
from Drop rows using two conditionals - pandas
No comments:
Post a Comment