I am doing rfecv in python using pandas. my step size is 1. I start with 174 features. My function call is as below
rfecv = RFECV(estimator=LogisticRegression(solver='lbfgs'), step=1, cv=StratifiedKFold(n_splits=10, shuffle=True, random_state=44),scoring='recall',\
min_features_to_select=30, verbose=0)
rfecv.fit(X_train, y['tag'])
Optimal number of features returned by rfecv is 89. I noticed that length of cv_results_['mean_test_score'] is 145.
Shouldn't it be 174-89=85? If RFECV removes 1 feature at a time and ends up with 89 features out of 174 then I felt that there will be 85 steps (length of 'mean_test_score').
#adding some dummy example-------------------------
In below case, we start with 150 features. minimum features to select is 3 and it selects 4 features. But then why print (len(selector.cv_results_['std_test_score'])) is 148 if 1 feature is eliminated at a time
from sklearn.datasets import make_friedman1
from sklearn.feature_selection import RFECV
from sklearn.svm import SVR
X, y = make_friedman1(n_samples=50, n_features=150, random_state=0)
estimator = SVR(kernel="linear")
selector = RFECV(estimator, step=1, cv=5, min_features_to_select=3)
selector = selector.fit(X, y)
print (selector.support_)
print (selector.ranking_)
print (selector.n_features_)
print (len(selector.cv_results_['std_test_score']))
from performing rfec in python and understanding output
No comments:
Post a Comment