Saturday 7 November 2020

any workaround to do forward forecasting for estimating time series in python?

I want to make forward forecasting for monthly times series of air pollution data such as what would be 3~6 months ahead of estimation on air pollution index. I tried scikit-learn models for forecasting and fitting data to the model works fine. But what I wanted to do is making a forward period estimate such as what would be 6 months ahead of the air pollution output index is going to be. In my current attempt, I could able to train the model by using scikit-learn. But I don't know how that forward forecasting can be done in python. To make a forward period estimate, what should I do? Can anyone suggest a possible workaround to do this? Any idea?

my attempt

import pandas as pd
from sklearn.preprocessing StandardScaler
from sklearn.metrics import accuracy_score
from sklearn.linear_model import BayesianRidge

url = "https://gist.githubusercontent.com/jerry-shad/36912907ba8660e11cd27be0d3e30639/raw/424f0891dc46d96cd5f867f3d2697777ac984f68/pollution.csv"

df = pd.read_csv(url, parse_dates=['dates'])
df.drop(columns=['Unnamed: 0'], inplace=True)

resultsDict={}
predictionsDict={}

split_date ='2017-12-01'
df_training = df.loc[df.index <= split_date]
df_test = df.loc[df.index > split_date]

df_tr = df_training.drop(['pollution_index'],axis=1)
df_te = df_test.drop(['pollution_index'],axis=1)

scaler = StandardScaler() 
scaler.fit(df_tr)
X_train = scaler.transform(df_tr)  
y_train = df_training['pollution_index']
X_test = scaler.transform(df_te)
y_test = df_test['pollution_index']
X_train_df = pd.DataFrame(X_train,columns=df_tr.columns)
X_test_df = pd.DataFrame(X_test,columns=df_te.columns)

reg = linear_model.BayesianRidge()
reg.fit(X_train, y_train)
yhat = reg.predict(X_test)
resultsDict['BayesianRidge'] = accuracy_score(df_test['pollution_index'], yhat)

new update 2

this is my attempt using ARMA model

from statsmodels.tsa.arima_model import ARIMA

index = len(df_training)
yhat = list()
for t in tqdm(range(len(df_test['pollution_index']))):
    temp_train = df[:len(df_training)+t]
    model = ARMA(temp_train['pollution_index'], order=(1, 1))
    model_fit = model.fit(disp=False)
    predictions = model_fit.predict(start=len(temp_train), end=len(temp_train), dynamic=False)
    yhat = yhat + [predictions]
    
yhat = pd.concat(yhat)
resultsDict['ARMA'] = evaluate(df_test['pollution_index'], yhat.values)

but this can't help me to make forward forecasting of estimating my time series data. what I want to do is, what would be 3~6 months ahead of estimated values of pollution_index. Can anyone suggest me a possible workaround to do this? How to overcome the limitation of my current attempt? What should I do? Can anyone suggest me a better way of doing this? Any thoughts?

update: goal

for the clarification, I am not expecting which model or approach works best, but what I am trying to figure it out is, how to make reliable forward forecasting for given time series (pollution index), how should I correct my current attempt if it is not efficient and not ready to do forward period estimation. Can anyone suggest any possible way to do this?

update-desired output

here is my sketch desired forecasting plot that I want to make:

enter image description here



from any workaround to do forward forecasting for estimating time series in python?

No comments:

Post a Comment