Tuesday, 22 December 2020

Adding exogenous variables to my univariate LSTM model

My data frame is on an hourly basis (index of my df) and I want to predict y.

> df.head()

          Date           y             
    2019-10-03 00:00:00 343   
    2019-10-03 01:00:00 101  
    2019-10-03 02:00:00 70  
    2019-10-03 03:00:00 67  
    2019-10-03 04:00:00 122  

I will now import the libraries and train the model:

  from keras.models import Sequential
  from keras.layers import Dense
  from keras.layers import LSTM
  from sklearn.preprocessing import MinMaxScaler
  min_max_scaler = MinMaxScaler()
  prediction_hours = 24
  df_train= df[:len(df)-prediction_hours]
  df_test= df[len(df)-prediction_hours:]
  print(df_train.head())
  print('/////////////////////////////////////////')
  print (df_test.head())
  training_set = df_train.values
  training_set = min_max_scaler.fit_transform(training_set)

  x_train = training_set[0:len(training_set)-1]
  y_train = training_set[1:len(training_set)]
  x_train = np.reshape(x_train, (len(x_train), 1, 1))
  num_units = 2
  activation_function = 'sigmoid'
  optimizer = 'adam'
  loss_function = 'mean_squared_error'
  batch_size = 10
  num_epochs = 100
  regressor = Sequential()
  regressor.add(LSTM(units = num_units, activation = activation_function, input_shape=(None, 1)))
  regressor.add(Dense(units = 1))
  regressor.compile(optimizer = optimizer, loss = loss_function)
  regressor.fit(x_train, y_train, batch_size = batch_size, epochs = num_epochs)

And after training, I can actually use it on my test data:

 test_set = df_test.values
 inputs = np.reshape(test_set, (len(test_set), 1))
 inputs = min_max_scaler.transform(inputs)
 inputs = np.reshape(inputs, (len(inputs), 1, 1))
 predicted_y = regressor.predict(inputs)
 predicted_y = min_max_scaler.inverse_transform(predicted_y)

This is the prediction I got:

Image

The forecast is actually pretty good: is it too good to be true? Am I doing anything wrong? I followed the implementation step by step from a GitHub implementation.

I want to add some exogenous variables, namely v1, v2, v3. If my dataset now looks like this with new variables,

df.head()

          Date           y   v1   v2   v3          
    2019-10-03 00:00:00 343  4     6    10  
    2019-10-03 01:00:00 101  3     2    24
    2019-10-03 02:00:00 70   0     0    50  
    2019-10-03 03:00:00 67   0     4    54
    2019-10-03 04:00:00 122  3     3    23

How can I include these variables v1,v2 and v3 in my LSTM model? The implementation of the multivariate LSTM is very confusing to me.

Edit to answer Yoan suggestion:

For a dataframe with the date as index and with the columns y, v1, v2 and v3, I've done the following as suggested:

  from keras.models import Sequential
  from keras.layers import Dense
  from keras.layers import LSTM
  from sklearn.preprocessing import MinMaxScaler
  min_max_scaler = MinMaxScaler()
  prediction_hours = 24
  df_train= df[:len(df)-prediction_hours]
  df_test= df[len(df)-prediction_hours:]
  print(df_train.head())
  print('/////////////////////////////////////////')
  print (df_test.head())
  training_set = df_train.values
  training_set = min_max_scaler.fit_transform(training_set)

  x_train = np.reshape(x_train, (len(x_train), 1, 4))
  y_train = training_set[0:len(training_set),1] #I've tried with 0:len.. and 
                                                                #for 1:len..
  
  num_units = 2
  activation_function = 'sigmoid'
  optimizer = 'adam'
  loss_function = 'mean_squared_error'
  batch_size = 10
  num_epochs = 100
  regressor = Sequential()
  regressor.add(LSTM(units = num_units, activation = activation_function, 
  input_shape=(None, 1,4)))
  regressor.add(Dense(units = 1))
  regressor.compile(optimizer = optimizer, loss = loss_function)
  regressor.fit(x_train, y_train, batch_size = batch_size, epochs = 
  num_epochs)

But I get the following error:

 only integer scalar arrays can be converted to a scalar index


from Adding exogenous variables to my univariate LSTM model

No comments:

Post a Comment