For an imbalanced dataset, at each K-fold of cross-validation, I want to do SMOTE resampling for the training fold data to make it balanced, and remain the hold-out/validation fold data unbalanced. Does SKLearn imblearn pipeline (first "resample" followed by "classifier") can achieve that? Does sklearn pipeline will result in balanced out the hold-out/validation data at each fold of cross-validation? I cannot find any official explanations on how the cross-validation + resampling being implemented in sklearn.

I expect to have someone confirm the sklearn pipeline cross-validation + resampling on how to deal with the resampling at each fold. If it is creating balanced data for both training and hold-out/validation sets or only creating balanced data for the training set within each fold of cross-validation remains the hold-out/validation set unbalanced. If possible, please share the code or evidence on it.

0

There are 0 best solutions below