Saturday, 19 June 2021

Pandas Rolling Gradient - Improving/Reducing Computation Time

I am calculating the rolling slope or gradient of a column in a pandas data frame with a datetime index and looking for suggestions to reduce computation time over the current approach using .rolling and .apply (detailed below).

You have additional requirements which are the minimum number of observations to include in the rolling calculation and the maximum window size (see example below):

Example, minimum number of points = 3, maximum window size = 7 days

datetime              values  intended_window.           gradient 
01-01-2010 00:00:00   10      np.nan                     NaN
01-02-2010 00:00:00   11      np.nan                     NaN
01-03-2010 00:00:00   12      [10,11,12]                 0.04167
01-04-2010 00:00:00   13      [10,11,12,13]              0.04167
01-05-2010 00:00:00   14      [10,11,12,13,14]           0.04167
01-06-2010 00:00:00   15      [10,11,12,13,14,15]        0.04167
01-07-2010 00:00:00   16      [10,11,12,13,14,15,16]     0.04167
01-08-2010 00:00:00   17      [11,12,13,14,15,16,17]     0.04167
01-09-2010 00:00:00   18      [12,12,14,15,16,17,18]     0.04167
01-10-2010 00:00:00   19      [13,14,15,16,17,18,19]     0.04167

The current approach is effectively:

gradient = df['values'].rolling(window='7d', min_periods=3).apply(get_slope, raw=False)

where

def get_slope(df):
  df = df.dropna()
  min_date = df.index.min()
  x = (df.index - min_date).total_seconds()/60/60
  y = np.array(df)
  slope, intercept, r_value, p_value, std_err = linregress(x,y)
  return slope

Does anyone have a suggestion on how this could be radically sped up? When increasing the maximum window size, the computation time increasing significantly. Is there anyway to vectorise this calculation?



from Pandas Rolling Gradient - Improving/Reducing Computation Time

No comments:

Post a Comment