I am calculating the rolling slope or gradient of a column in a pandas data frame with a datetime index and looking for suggestions to reduce computation time over the current approach using .rolling and .apply (detailed below).
You have additional requirements which are the minimum number of observations to include in the rolling calculation and the maximum window size (see example below):
Example, minimum number of points = 3, maximum window size = 7 days
datetime values intended_window. gradient
01-01-2010 00:00:00 10 np.nan NaN
01-02-2010 00:00:00 11 np.nan NaN
01-03-2010 00:00:00 12 [10,11,12] 0.04167
01-04-2010 00:00:00 13 [10,11,12,13] 0.04167
01-05-2010 00:00:00 14 [10,11,12,13,14] 0.04167
01-06-2010 00:00:00 15 [10,11,12,13,14,15] 0.04167
01-07-2010 00:00:00 16 [10,11,12,13,14,15,16] 0.04167
01-08-2010 00:00:00 17 [11,12,13,14,15,16,17] 0.04167
01-09-2010 00:00:00 18 [12,12,14,15,16,17,18] 0.04167
01-10-2010 00:00:00 19 [13,14,15,16,17,18,19] 0.04167
The current approach is effectively:
gradient = df['values'].rolling(window='7d', min_periods=3).apply(get_slope, raw=False)
where
def get_slope(df):
df = df.dropna()
min_date = df.index.min()
x = (df.index - min_date).total_seconds()/60/60
y = np.array(df)
slope, intercept, r_value, p_value, std_err = linregress(x,y)
return slope
Does anyone have a suggestion on how this could be radically sped up? When increasing the maximum window size, the computation time increasing significantly. Is there anyway to vectorise this calculation?
from Pandas Rolling Gradient - Improving/Reducing Computation Time
No comments:
Post a Comment