I am working on a timeseries analysis with SARIMAX and have been really struggling with it.
I think I have successfully fit a model and used it to make predictions, however, I don't know how to make out of sample forecast with exogenous data.
I may be doing the whole thing wrong so I have included my steps below with some sample data;
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from pandas import datetime
# Defining Sample data
df=pd.DataFrame({'date':['2019-01-01','2019-01-02','2019-01-03',
'2019-01-04','2019-01-05','2019-01-06',
'2019-01-07','2019-01-08','2019-01-09',
'2019-01-10','2019-01-11','2019-01-12'],
'price':[78,60,62,64,66,68,70,72,74,76,78,80],
'factor1':[178,287,152,294,155,245,168,276,165,275,178,221]
})
# Changing index to datetime
df['date'] = pd.to_datetime(df['date'], errors='ignore', format='%Y%m%d')
select_dates = df.set_index(['date'])
df = df.set_index('date')
df.index = pd.to_datetime(df.index)
df.sort_index(inplace=True)
df.dropna(inplace=True)
# Splitting Data into test and training sets manually
train=df.loc['2019-01-01':'2019-01-09']
test=df.loc['2019-01-10':'2019-01-12']
# setting index to datetime for test and train datasets
train.index = pd.DatetimeIndex(train.index).to_period('D')
test.index = pd.DatetimeIndex(test.index).to_period('D')
# Defining and fitting the model with training data for endogenous and exogenous data
import statsmodels.api as sm
model=sm.tsa.statespace.SARIMAX(train['price'],
order=(0, 0, 0),
seasonal_order=(0, 0, 0,12),
exog=train.iloc[:,1:],
time_varying_regression=True,
mle_regression=False)
model_1= model.fit(disp=False)
# Defining exogenous data for testing
exog_test=test.iloc[:,1:]
# Forcasting out of sample data with exogenous data
forecast = model_1.forecast(3, exog=exog_test)
so my problem is really with the last line, what do I do if I want more than 3 steps?
from SARIMAX out of sample forecast with exogenous data
No comments:
Post a Comment