Sunday, 30 October 2022

Multiple period persistence, vectorization, time series python

I have a DataFrame with daily values and I am building out a forecast using various methods predicting the values for the next two weeks.

As a base, naive, forecast I want to simply say the value today is the best forcast for the next two weeks e.g.:

  • the value on 01-Jan-2012 is 100, then I would like the forecast for 02-Jan-2012 to 15-Jan-2022 to be 100
  • the value on 02-Jan-2012 is 110, then I would like the forecast for 03-Jan-2012 to 16-Jan-2022 to be 110
  • etc

This method can then be compared to the other forecasts to see whether they add value over a naive approach.

To backtest this model how can I do this? I have a few years worth of data in a DataFrame, and I want to do something like below. Reading online, I can only find 1 day persistence help whereby simply using something like df.shift(1) does the job.

Pseudocode:
get the first row from the DataFrame
extract the date from the index
extract the value from the column
propogate forward this value for the next fourteen days
save these forecast dates and forecast values

get the second row from the DataFrame
extract the date from the index
extract the value from the column
propogate forward this value for the next fourteen days
save these forecast dates and forecast values

REPEAT...

However, I've read that iterating over rows is advised against and it is better to use something like pandas apply to 'vectorize' the data but I am not sure how to do this. I was thinking of writing a function to predict the next 14 days then using the apply method to call this function, but not sure how to do so or if this is the best way.

I've also read that numpy is very good for these sorts of problems, but again, am not too familiar.

I've set up a sqlite database so I can store forecasts in there if that helps.



from Multiple period persistence, vectorization, time series python

No comments:

Post a Comment