Friday, 21 July 2023

performing rfec in python and understanding output

I am doing rfecv in python using pandas. my step size is 1. I start with 174 features. My function call is as below

rfecv = RFECV(estimator=LogisticRegression(solver='lbfgs'), step=1, cv=StratifiedKFold(n_splits=10, shuffle=True, random_state=44),scoring='recall',\
              min_features_to_select=30, verbose=0)
rfecv.fit(X_train, y['tag'])

Optimal number of features returned by rfecv is 89. I noticed that length of cv_results_['mean_test_score'] is 145.

Shouldn't it be 174-89=85? If RFECV removes 1 feature at a time and ends up with 89 features out of 174 then I felt that there will be 85 steps (length of 'mean_test_score').

#adding some dummy example-------------------------

In below case, we start with 150 features. minimum features to select is 3 and it selects 4 features. But then why print (len(selector.cv_results_['std_test_score'])) is 148 if 1 feature is eliminated at a time

from sklearn.datasets import make_friedman1
from sklearn.feature_selection import RFECV
from sklearn.svm import SVR
X, y = make_friedman1(n_samples=50, n_features=150, random_state=0)
estimator = SVR(kernel="linear")
selector = RFECV(estimator, step=1, cv=5, min_features_to_select=3)
selector = selector.fit(X, y)
print (selector.support_)
print (selector.ranking_)
print (selector.n_features_)

print (len(selector.cv_results_['std_test_score']))


from performing rfec in python and understanding output

No comments:

Post a Comment