Recursive Feature Elimination plot feature number VS score

35 Views Asked by hexolitemax At 17 January 2024 at 10:46

Good morning,

I am trying to select features with RFECVfrom sklearn.feature_selection and I am puzzled by the number of features vs CV score plot. The plot goes up and down at almost each step and it isn't easy to trust the results. The optimal number of features is 5 (among 92 backed by the scientific literature on my topic).

RFECV code below. Note that I use XGB classifier, the score to optimize is neg_log_loss and randomCV_clf is a custom CV returning train/validation indexes for 5 folds (tested elsewhere and working fine).

xgboost_clf = xgb.XGBClassifier(random_state = 57, 
                                grow_policy = "depthwise", 
                                booster = "gbtree",
                                tree_method = "auto",
                                )
step = 1
rfecv = RFECV(
    estimator=xgboost_clf,
    step=step,
    cv=randomCV_clf,
    scoring="neg_log_loss",
    min_features_to_select= 1,
    n_jobs=-1, 
)
rfecv.fit(preprocessed_X_train_full, y_train_full)

The plot code is:

import matplotlib.pyplot as plt

n_scores = len(rfecv.cv_results_["mean_test_score"])
plt.figure()
plt.xlabel("Number of features selected")
plt.ylabel("Mean test accuracy")
plt.errorbar(
    range(preprocessed_X_train_full.shape[1]-n_scores*step,
        preprocessed_X_train_full.shape[1],
        step),
    rfecv.cv_results_["mean_test_score"],
    # yerr=rfecv.cv_results_["std_test_score"], # error bars
)
plt.title("Recursive Feature Elimination \nwith correlated features")
plt.show()

Original Q&A

Recursive Feature Elimination plot feature number VS score

There are 0 best solutions below

Related Questions in XGBOOST

Related Questions in FEATURE-SELECTION

Related Questions in RFE

Trending Questions

Popular # Hahtags

Popular Questions