Thursday 26 November 2020

Can we make the ML model (pickle file) more robust, by accepting (or ignoring) new features?

  • I have trained a ML model, and stored it into a Pickle file.
  • In my new script, I am reading new 'real world data', on which I want to do a prediction.

However, I am struggling. I have a column (containing string values), like:

Sex       
Male       
Female
# This is just as example, in real it is having much more unique values

Now comes the issue. I received a new (unique) value, and now I cannot make predictions anymore (e.g. 'Neutral' was added).

Since I am transforming the 'Sex' column into Dummies, I do have the issue that my model is not accepting the input anymore,

Number of features of the model must match the input. Model n_features is 2 and input n_features is 3

Therefore my question: is there a way how I can make my model robust, and just ignore this class? But do a prediction, without the specific info?

What I have tried:

df = pd.read_csv('dataset_that_i_want_to_predict.csv')
model = pickle.load(open("model_trained.sav", 'rb'))

# I have an 'example_df' containing just 1 row of training data (this is exactly what the model needs)
example_df = pd.read_csv('reading_one_row_of_trainings_data.csv')

# Checking for missing columns, and adding that to the new dataset 
missing_cols = set(example_df.columns) - set(df.columns)
for column in missing_cols:
    df[column] = 0 #adding the missing columns, with 0 values (Which is ok. since everything is dummy)

# make sure that we have the same order 
df = df[example_df.columns] 

# The prediction will lead to an error!
results = model.predict(df)

# ValueError: Number of features of the model must match the input. Model n_features is X and n_features is Y

Note, I searched, but could not find any helpfull solution (not here, here or here

UPDATE

Also found this article. But same issue here.. we can make the test set with the same columns as training set... but what about new real world data (e.g. the new value 'Neutral')?



from Can we make the ML model (pickle file) more robust, by accepting (or ignoring) new features?

No comments:

Post a Comment