I am aiming to the below output.
| A | B | C | D | E | F |
|---|---|---|---|---|---|
| 0000 | ZZZ | 987 | QW1 | 8 | first three-four col and offset |
| 0000 | ZZZ | 987 | QW1 | -8 | first three-four col and offset |
| 0000 | ZZZ | 987 | QW1 | -8 | first or no match |
| 1111 | AAA | 123 | AB1 | 1 | first three-four col and offset |
| 1111 | AAA | 123 | CD1 | -1 | first three-four col and offset |
| 2222 | BBB | 456 | EF1 | -4 | first three-four col and offset |
| 2222 | BBB | 456 | GH1 | -1 | first three-four col and offset |
| 2222 | BBB | 456 | IL1 | 5 | first three-four col and offset |
| 3333 | CCC | 789 | MN1 | 2 | first two col and offset |
| 3333 | CCC | 101 | MN1 | -2 | first two col and offset |
| 4444 | DDD | 121 | UYT | 6 | first two col and offset |
| 4444 | DDD | 131 | FB1 | -5 | first two col and offset |
| 4444 | DDD | 141 | UYT | -1 | first two col and offset |
| 5555 | EEE | 151 | CB1 | 3 | first two col and offset |
| 5555 | EEE | 161 | CR1 | -3 | first two col and offset |
| 5555 | EEE | 161 | CR1 | -5 | first or no match |
| 6666 | FFF | 111 | CB1 | 4 | first or no match |
| 7777 | GGG | 222 | ZB1 | 10.5 | first three-four col and small offset |
| 7777 | GGG | 222 | ZB1 | -10 | first three-four col and small offset |
1st rule) the first three columns must equal each other - regardless of the fourth, which can either be equal or not. Each combination must have the associated numbers (col E) offset to zero (can be from 2 to X records combined). IT SHOULD WORK EVEN WITH UNEVEN ENTRIES.
2nd rule) the first two columns must equal each other - regardless of the fourth, which can either be equal or not. Each combination must have the associated numbers (col E) offset to zero (can be from 2 to X records combined). IT SHOULD WORK EVEN WITH UNEVEN ENTRIES.
3rd rule) no match.
4th rule) the first three columns must equal each other - regardless of the fourth, which can either be equal or not. Each combination can have a difference of 0.5 AT MOST (col E) and NO offset to zero (can be from 2 to X records combined). IT SHOULD WORK EVEN WITH UNEVEN ENTRIES.
Please see my code below.
Through the below code, I am able to get a similar result, however, this way does not work with uneven entries, for instance the first three rows would result as following:
| A | B | C | D | E | F |
|---|---|---|---|---|---|
| 0000 | ZZZ | 987 | QW1 | 8 | first or no match |
| 0000 | ZZZ | 987 | QW1 | -8 | first or no match |
| 0000 | ZZZ | 987 | QW1 | -8 | first or no match |
Instead of the following:
| A | B | C | D | E | F |
|---|---|---|---|---|---|
| 0000 | ZZZ | 987 | QW1 | 8 | first three-four col and offset |
| 0000 | ZZZ | 987 | QW1 | -8 | first three-four col and offset |
| 0000 | ZZZ | 987 | QW1 | -8 | first or no match |
Code so far:
m1 = df.groupby(['A', 'B', 'C'])['E'].transform('sum').eq(0) # Rule 1
m2 = df.groupby(['A', 'B'])['E'].transform('sum').eq(0) # Rule 2
m3 = df.groupby(['A', 'B', 'C'])['E'].transform('sum').abs().le(0.5) # Rule 4
df['new'] = np.select([m1, m2, m3], ['first three-four col and offset',
'first two col and offset', 'first three-four col and small offset'], 'first or no match')
from Pandas - Updating columns based on several conditions - group by method
No comments:
Post a Comment