Friday 9 July 2021

Drop rows using two conditionals - pandas

I've got a df below with numerous duplicate values. Using below, I'm aiming to drop rows where Value is unique compared to the previous rows and Group is equal to C.

Further, where this occurs I want to remove all previous duplicate rows.

d = {'Item': ["Red", "Red", "Red", "Green", "Green", "Red", "Red", "Red", "Green", "Green", "Green", "Green", "Red", "Red", "Red", "Green"],
     'Value': [1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 6],
     'Group': ["A", "B", "B", "C", "D", "D", "A", "B", "C", "D", "E", "E", "B", "B", "D", "D"],
    }

df = pd.DataFrame(data=d)

mask = (df['Item'].isin(['Green'])) & (df.Value.eq(df.Value.shift(-1)))

df = df[~mask]

out:

     Item  Value Group
0     Red      1     A
1     Red      1     B
2     Red      1     B
4   Green      2     D
5     Red      3     D
6     Red      3     A
7     Red      3     B
11  Green      4     E
12    Red      5     B
13    Red      5     B
14    Red      5     D
15  Green      6     D

intended output:

     Item  Value Group
0     Red      1     A
4   Green      2     D
5     Red      3     D
6     Red      3     A
9   Green      4     D
10  Green      4     E
11  Green      4     E
12    Red      5     B
13    Red      5     B
14    Red      5     D
15  Green      6     D

@Anurag

out=df.shift(-1)
cond=df['Value'].ne(out['Value']) & out['Group'].eq('C')
index=list(flatten((df[cond].index+1).map(range)))
to_drop=df.loc[index]
to_drop=to_drop[to_drop.duplicated(['Item','Value'])].index.tolist()+(df[cond].index+1).tolist()
df=df.drop(to_drop)

out:

     Item  Value Group
5     Red      3     D
9   Green      4     D
10  Green      4     E
11  Green      4     E
12    Red      5     B
13    Red      5     B
14    Red      5     D
15  Green      6     D


from Drop rows using two conditionals - pandas

No comments:

Post a Comment