I've got a df below with numerous duplicate values. Using below, I'm aiming to drop rows where Value
is unique compared to the previous rows and Group
is equal to C
.
Further, where this occurs I want to remove all previous duplicate rows.
d = {'Item': ["Red", "Red", "Red", "Green", "Green", "Red", "Red", "Red", "Green", "Green", "Green", "Green", "Red", "Red", "Red", "Green"],
'Value': [1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 6],
'Group': ["A", "B", "B", "C", "D", "D", "A", "B", "C", "D", "E", "E", "B", "B", "D", "D"],
}
df = pd.DataFrame(data=d)
mask = (df['Item'].isin(['Green'])) & (df.Value.eq(df.Value.shift(-1)))
df = df[~mask]
out:
Item Value Group
0 Red 1 A
1 Red 1 B
2 Red 1 B
4 Green 2 D
5 Red 3 D
6 Red 3 A
7 Red 3 B
11 Green 4 E
12 Red 5 B
13 Red 5 B
14 Red 5 D
15 Green 6 D
intended output:
Item Value Group
0 Red 1 A
4 Green 2 D
5 Red 3 D
6 Red 3 A
9 Green 4 D
10 Green 4 E
11 Green 4 E
12 Red 5 B
13 Red 5 B
14 Red 5 D
15 Green 6 D
@Anurag
out=df.shift(-1)
cond=df['Value'].ne(out['Value']) & out['Group'].eq('C')
index=list(flatten((df[cond].index+1).map(range)))
to_drop=df.loc[index]
to_drop=to_drop[to_drop.duplicated(['Item','Value'])].index.tolist()+(df[cond].index+1).tolist()
df=df.drop(to_drop)
out:
Item Value Group
5 Red 3 D
9 Green 4 D
10 Green 4 E
11 Green 4 E
12 Red 5 B
13 Red 5 B
14 Red 5 D
15 Green 6 D
from Drop rows using two conditionals - pandas
No comments:
Post a Comment