Saturday 6 March 2021

How to perform cell by cell comparison using pandas?

Supposing I have a df like the following,

col1   | type   | date_1 | date_2 | date_3 |.... | date_n
ab     |   A    |  -10   |        |  -10
ab     |   B    |  100   |   99   |  -12
cd     |   A    |   0    |  -25   |   6
cd     |   B    |  -1    |   8    |  -34
ab     |   A    |   98   |  -9    |   0
ab     |   B    |  -7    |  -2    |   0

First step is to remove all positive numbers including 0

Now the df should look like,

col1   | type   | date_1 | date_2 | date_3 | .... | date_n
ab     |   A    |  -10   |        |  -10   |
ab     |   B    |        |        |  -12   |
cd     |   A    |        |  -25   |        |
cd     |   B    |  -1    |        |  -34   |
ab     |   A    |        |  -9    |        |
ab     |   B    |  -7    |  -2    |        |

Second step is to compare the numbers for each 'date' col as per 'type' A and B,

  • If the 'type' A row has a negative number and 'type' B is blank, then remove the negative number, of 'date' col, of 'type' A

  • If the 'type' B row has a negative number and 'type' A is blank, then do nothing

  • If both types are blank do nothing

After this step, the df should look like this,

col1   | type   | date_1 | date_2 | date_3 | .... | date_n
ab     |   A    |        |        |  -10   |
ab     |   B    |        |        |  -12   |
cd     |   A    |        |        |        |
cd     |   B    |  -1    |        |  -34   |
ab     |   A    |        |  -9    |        |
ab     |   B    |  -7    |  -2    |        |

Final step,

  • If both types are negative for the current, for each set of col1 (ab,cd,ab), check the left-hand-side value of same Ath and Bth of the same row,

    1) If both types A and B values are blank, then remove the remove the negative number of current row 'type' A and keep the -ve number of 'type' B
    
    2) If either of the types are blank, then remove the negative of the current row 'type' B and keep the -ve number of 'type' A
    

Finally, the final_df should look like this,

col1   | type   | date_1 | date_2 | date_3 | .... | date_n
ab     |   A    |        |        |        |
ab     |   B    |        |        |  -12   |
cd     |   A    |        |        |        |
cd     |   B    |  -1    |        |  -34   |
ab     |   A    |        |  -9    |        |
ab     |   B    |  -7    |        |        |

For the final step, the comparison should start from the 'date_2'.

What would be the best way to solve this problem? Any help would be greatly appreciated!

Note: I cannot use the column headers (the date ones) to manipulate data because they will keep changing.



from How to perform cell by cell comparison using pandas?

No comments:

Post a Comment