Does sklearn pipeline cross-validation resample the training fold data only and remain the hold-out/validation fold unbalanced at each CV fold?

73 Views Asked by displayname At 15 September 2023 at 17:35

For an imbalanced dataset, at each K-fold of cross-validation, I want to do SMOTE resampling for the training fold data to make it balanced, and remain the hold-out/validation fold data unbalanced. Does SKLearn imblearn pipeline (first "resample" followed by "classifier") can achieve that? Does sklearn pipeline will result in balanced out the hold-out/validation data at each fold of cross-validation? I cannot find any official explanations on how the cross-validation + resampling being implemented in sklearn.

I expect to have someone confirm the sklearn pipeline cross-validation + resampling on how to deal with the resampling at each fold. If it is creating balanced data for both training and hold-out/validation sets or only creating balanced data for the training set within each fold of cross-validation remains the hold-out/validation set unbalanced. If possible, please share the code or evidence on it.

Original Q&A

Does sklearn pipeline cross-validation resample the training fold data only and remain the hold-out/validation fold unbalanced at each CV fold?

There are 0 best solutions below

Related Questions in SCIKIT-LEARN

Related Questions in CROSS-VALIDATION

Related Questions in RESAMPLING

Trending Questions

Popular # Hahtags

Popular Questions