I have a DataFrame with daily values and I am building out a forecast using various methods predicting the values for the next two weeks.
As a base, naive, forecast I want to simply say the value today is the best forcast for the next two weeks e.g.:
- the value on
01-Jan-2012
is100
, then I would like the forecast for02-Jan-2012
to15-Jan-2022
to be100
- the value on
02-Jan-2012
is110
, then I would like the forecast for03-Jan-2012
to16-Jan-2022
to be110
- etc
This method can then be compared to the other forecasts to see whether they add value over a naive approach.
To backtest this model how can I do this? I have a few years worth of data in a DataFrame, and I want to do something like below. Reading online, I can only find 1 day persistence help whereby simply using something like df.shift(1)
does the job.
Pseudocode:
get the first row from the DataFrame
extract the date from the index
extract the value from the column
propogate forward this value for the next fourteen days
save these forecast dates and forecast values
get the second row from the DataFrame
extract the date from the index
extract the value from the column
propogate forward this value for the next fourteen days
save these forecast dates and forecast values
REPEAT...
However, I've read that iterating over rows is advised against and it is better to use something like pandas apply
to 'vectorize' the data but I am not sure how to do this. I was thinking of writing a function to predict the next 14 days then using the apply
method to call this function, but not sure how to do so or if this is the best way.
I've also read that numpy is very good for these sorts of problems, but again, am not too familiar.
I've set up a sqlite database so I can store forecasts in there if that helps.
from Multiple period persistence, vectorization, time series python
No comments:
Post a Comment