Sunday 8 September 2019

When do feature selection in imblearn pipeline with cross-validation and grid search

Currently I am building a classifier with heavily imbalanced data. I am using the imblearn pipeline to first to StandardScaling, SMOTE, and then the classification with gridSearchCV. This ensures that the upsampling is done during the cross-validation. Now I want to include feature_selection into my pipeline. How should I include this step into the pipeline?

model = Pipeline([
        ('sampling', SMOTE()),
        ('classification', RandomForestClassifier())
    ])

param_grid = { 
    'classification__n_estimators': [10, 20, 50],
    'classification__max_depth' : [2,3,5]
}

gridsearch_model = GridSearchCV(model, param_grid, cv = 4, scoring = make_scorer(recall_score))
gridsearch_model.fit(X_train, y_train)
predictions = gridsearch_model.predict(X_test)
print(classification_report(y_test, predictions))
print(confusion_matrix(y_test, predictions))



from When do feature selection in imblearn pipeline with cross-validation and grid search

No comments:

Post a Comment