Tuesday, 1 September 2020

Shift groupby by an unknown number of line

I have a pivot table coming from a dataframe with the same structure as below:

raw = pd.DataFrame([[123456,datetime(2020,7,1), "XXX",'A',1 ],
                   [123456,datetime(2020,7,1), "XXX",'B',2 ],
                   [123456,datetime(2020,7,1), "XXX",'C',3 ],
                   [123456,datetime(2020,7,1), "YYY",'A',4 ],
                   [123456,datetime(2020,7,1), "YYY",'B',5 ],
                   [123456,datetime(2020,7,1), "YYY",'C',6 ],
                   [123456,datetime(2020,7,2), "XXX",'A',7 ],
                   [123456,datetime(2020,7,2), "XXX",'B',8 ],
                   [123456,datetime(2020,7,2), "XXX",'C',float('NaN'),
                   [123456,datetime(2020,7,2), "YYY",'A',9 ],
                   [123456,datetime(2020,7,2), "YYY",'B',10 ],
                   [123456,datetime(2020,7,2), "YYY",'C',11 ],
                   [789012,datetime(2020,7,1), "XXX",'A',12 ],
                   [789012,datetime(2020,7,1), "XXX",'B',13 ],
                   [789012,datetime(2020,7,1), "XXX",'C',14 ],
                   [789012,datetime(2020,7,2), "XXX",'A',15 ],
                   [789012,datetime(2020,7,2), "XXX",'B',16 ],
                   [789012,datetime(2020,7,2), "XXX",'C',17 ],
                           ] , columns=['GROUP_ID','DATE', 'REFERENCE', 'NAME', 'VALUE'])

pt = raw.pivot_table(index=['GROUP_ID', 'DATE', 'REFERENCE'], columns=['NAME'], values=['VALUE'])

                              VALUE
                    NAME      A    B    C
GROUP_ID      DATE  REFERENCE           
123456  2020-07-01  XXX       1.0  2.0  3.0
                    YYY       4.0  5.0  6.0
        2020-07-02  XXX       7.0  8.0  NaN
                    YYY       9.0 10.0 11.0
789012  2020-07-01  XXX      12.0 13.0 14.0
        2020-07-02  XXX      15.0 16.0 17.0

The idea is to create a column ("VALUE_PREV", "C") where i can have the value of the previous date inside each GROUP_ID. If i do pt[("VALUE_PREV","C")] = pt["VALUE"].groupby(level=0)["C"].shift() i have :

                                VALUE          VALUE_PREV
                         NAME   A    B    C      C
GROUP_ID      DATE  REFERENCE               
123456  2020-07-01  XXX        1.0   2.0   3.0  NaN
                    YYY        4.0   5.0   6.0  3.0
        2020-07-02  XXX        7.0   8.0   NaN  6.0
                    YYY        9.0  10.0  11.0  NaN
789012  2020-07-01  XXX       12.0  13.0  14.0  NaN
        2020-07-02  XXX       15.0  16.0  17.0 14.0

So here on the last line in the VALUE_PREVcolumn 14.0 is well placed but 3.0 and 6.0 should be one line below. The solution could be to do shift(2) but the number of REFERENCE by date varies depending on the GROUP_ID and then 14.0 would be one line below. So i don't know how to get that number of REFERENCE per GROUP_ID to adjust the shift() dynamically.



from Shift groupby by an unknown number of line

No comments:

Post a Comment