Following my answered question: R or Python - loop the test data - Prediction validation next 24 hours (96 values each day)
I want to predict the next day using H2o Package. You can find detail explanation for my dataset in the same above link.
The data dimension in H2o is different
So, after making the prediction, I want to calculate the MAPE
I have to change training and testing data to H2o format
train_h2o< - as.h2o(train_data)
test_h2o< - as.h2o(test_data)
mape_calc <- function(sub_df) {
pred <- predict.glm(glm_model, sub_df)
actual <- sub_df$Ptot
mape <- 100 * mean(abs((actual - pred)/actual))
new_df <- data.frame(date = sub_df$date[[1]], mape = mape)
return(new_df)
}
# LIST OF ONE-ROW DATAFRAMES
df_list <- by(test_data, test_data$date, map_calc)
# FINAL DATAFRAME
final_df <- do.call(rbind, df_list)
The upper code works well for "Non-H2o" prediction validation for the day-ahead and it calculates the MAPE for every day.
I tried to convert the H2o predicted model to normal format but according to to:https://stackoverflow.com/a/39221269/9341589, it is not possible.
To make a prediction in H2o
for instance, let say we want to create a Random Forest Model
y <- "RealPtot" #target
x <- names(train_h2o) %>% setdiff(y) #features
rforest.model <- h2o.randomForest(y=y, x=x, training_frame = train_h2o, ntrees = 2000, mtries = 3, max_depth = 4, seed = 1122)
then we can get the prediction for complete dataset as shown below.
predict.rforest <- as.data.frame(h2o.predict(rforest.model, test_h2o)
But in my case I am trying to get one-day prediction using mape_calc
So modifying the code to accept H2o input format
mape_calc <- function(sub_df) {
pred <- predict(rforest.model, sub_df)
#I modified this line
actual <-sub_df[, "RealPtot"]
mape <- 100 * mean(abs((actual - pred)/actual))
#And I changed this line
new_df <- data.frame(date = sub_df[,"date"][[1]], mape = mape)
return(new_df)
}
# LIST OF ONE-ROW DATAFRAMES
df_list <- by(test_h2o, test_h2o[, "RealPtot"], mape_calc )
# FINAL DATAFRAME
final_df <- do.call(rbind, df_list)
I am getting error in df_list stage:
Error in unique.default(x, nmax = nmax) :
invalid type/length (environment/0) in vector allocation
NOTE: Any thoughts in R or Python will be appreciated.
from Predict Day-Ahead in parallelized and scalabile Environment -H2o Package - R or Python
No comments:
Post a Comment