Friday, 20 November 2020

Rolling sum on a column while weighting by other column and relative position

I have a table like this :

import pandas as pd
values = [0,0,0,2000,0,0,700,0,0,530,1000,820,0,0,200]
durations = [0,0,0,12,0,0,8,0,0,2,5,15,0,0,3]

ex = pd.DataFrame({'col_to_roll' : values, 'max_duration': durations})

    col_to_roll  max_duration
0             0             0
1             0             0
2             0             0
3          2000            12
4             0             0
5             0             0
6           700             8
7             0             0
8             0             0
9           530             2
10         1000             5
11          820            15
12            0             0
13            0             0
14          200             3

For each row position i, I want to do a rolling sum of col_to_roll between indexes i-7 and i-4 (both included). The caveat is that I want the values "further in the past" to be counted more, depending on the column max_duration (which tells for how many timesteps in the future that value can still have an effect).
There's a higher bound which is the remaining timesteps to be counted (min 1, max 4). So if I'm on row number 7 doing the roll-up sum: the value on row number 1 will be counted min(max_duration[1],4), the value on row number 2 will be counted min(max_duration[2],3) etc.

I could do it the brute force way :

new_col = []
for i in range(7,len(ex)) : 
    rolled_val = sum([ex.iloc[j].col_to_roll*min(ex.iloc[j].max_duration , i-j+1-4) \
                     for j in range(i-7,i-3)])
    new_col.append(rolled_val)
ex['rolled_col'] = [np.nan]*7+new_col

Which lands the following results from the example above :

        col_to_roll  max_duration  rolled_col
    0             0             0         NaN
    1             0             0         NaN
    2             0             0         NaN
    3          2000            12         NaN
    4             0             0         NaN
    5             0             0         NaN
    6           700             8         NaN
    7             0             0      2000.0
    8             0             0      4000.0
    9           530             2      6000.0
    10         1000             5      8700.0
    11          820            15      1400.0
    12            0             0      2100.0
    13            0             0      3330.0
    14          200             3      2060.0

That being said, I'd appreciate a more elegant (and more importantly, more efficient) way to get this result with some pandas magic.



from Rolling sum on a column while weighting by other column and relative position

No comments:

Post a Comment