I have a table like this :
import pandas as pd
values = [0,0,0,2000,0,0,700,0,0,530,1000,820,0,0,200]
durations = [0,0,0,12,0,0,8,0,0,2,5,15,0,0,3]
ex = pd.DataFrame({'col_to_roll' : values, 'max_duration': durations})
col_to_roll max_duration
0 0 0
1 0 0
2 0 0
3 2000 12
4 0 0
5 0 0
6 700 8
7 0 0
8 0 0
9 530 2
10 1000 5
11 820 15
12 0 0
13 0 0
14 200 3
For each row position i
, I want to do a rolling sum of col_to_roll
between indexes i-7
and i-4
(both included). The caveat is that I want the values "further in the past" to be counted more, depending on the column max_duration
(which tells for how many timesteps in the future that value can still have an effect).
There's a higher bound which is the remaining timesteps to be counted (min 1, max 4). So if I'm on row number 7 doing the roll-up sum: the value on row number 1 will be counted min(max_duration[1],4)
, the value on row number 2 will be counted min(max_duration[2],3)
etc.
I could do it the brute force way :
new_col = []
for i in range(7,len(ex)) :
rolled_val = sum([ex.iloc[j].col_to_roll*min(ex.iloc[j].max_duration , i-j+1-4) \
for j in range(i-7,i-3)])
new_col.append(rolled_val)
ex['rolled_col'] = [np.nan]*7+new_col
Which lands the following results from the example above :
col_to_roll max_duration rolled_col
0 0 0 NaN
1 0 0 NaN
2 0 0 NaN
3 2000 12 NaN
4 0 0 NaN
5 0 0 NaN
6 700 8 NaN
7 0 0 2000.0
8 0 0 4000.0
9 530 2 6000.0
10 1000 5 8700.0
11 820 15 1400.0
12 0 0 2100.0
13 0 0 3330.0
14 200 3 2060.0
That being said, I'd appreciate a more elegant (and more importantly, more efficient) way to get this result with some pandas magic.
from Rolling sum on a column while weighting by other column and relative position
No comments:
Post a Comment