Thursday, 27 August 2020

any way to predict monthly time series with scikit-learn in python?

I want to forecast product' sales_index by using multiple features in the monthly time series. in the beginning, I started to use ARMA, ARIMA to do this but the output is not very satisfying to me. In my attempt, I just used dates and sales column to do forecasting, and output is not realistic to me. I think I should include more features column to predict sales_index column. However, I was wondering is there any way to do this prediction by using multiple features from the monthly time series. I haven't done much of time series using scikit-learn. Can anyone point me out any possible way of doing this? Any possible thoughts?

my attempt using ARMA/ARIMA:

Here is reproducible monthly time series data on this gist and here is my current attempt:

import pandas as pd
from statsmodels.tsa.arima_model import ARMA
from statsmodels.tsa.arima_model import ARIMA
from statsmodels.tsa.statespace.sarimax import SARIMAX
import matplotlib.pyplot as plt

df = pd.read_csv("tsdf.csv", sep=",")
dates = pd.date_range(start='2015-01', freq='MS', periods=len(df))
df.set_index(dates,inplace=True)
train = df[df.index < '2019-01']
test = df[df.index >= '2019-01']

model = ARMA(train['sales_index'],order=(2,0))
model_fit = model.fit()
predictions = model_fit.predict(start=len(train), end=len(train)+len(test)-1, dynamic=False)
# plot results
plt.figure(figsize=(12,6))
plt.plot(test['sales_index'])
plt.plot(predictions, color='red')
plt.show()

and here is the output of my current attempt:

enter image description here

in my attempt, I just simply used df['sales_index] and df['dates'] for ARMA model. Clearly doing this way, the prediction output is not very realistic and informative. I am thinking if there is any way I can feed all features columns except df['sales_index'] to the model to predict df['sales_index']. I couldn't figure out better way of doing this with ARMA model.

Perhaps scikit-learn might serve better roles for this prediction. I am not sure how to achieve this using sklearn to do this time series analysis. Can anyone point me out possible sklearn solution for this time series? Is there any possible of doing this in sklearn? Any possible thoughts? Thanks



from any way to predict monthly time series with scikit-learn in python?

No comments:

Post a Comment