I am working on a classification problem related to heart disease using RandomForestClassifier. While performing hyperparameter tuning on RandomForestClassifier, I am facing the following issue. I am using sklearn Pipeline and ColumnTransformer for preprocessing.
Error: 720 fits failed out of a total of 2160.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.
UserWarning: One or more of the test scores are non-finite
numerical_pipeline = Pipeline(
steps=[('scaler',StandardScaler())]
)
categorical_pipeline = Pipeline(
steps=[('encoder',OneHotEncoder(handle_unknown='ignore'))]
)
preprocessor = ColumnTransformer(
[('numerical_pipeline',numerical_pipeline,numerical_features),
('categorical_pipeline',categorical_pipeline,categorical_features)]`
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3)`
scaled_X_train = preprocessor.fit_transform(X_train)
scaled_X_test = preprocessor.transform(X_test)`
param_grid={'max_depth':[3,5,10,None],
'n_estimators':[10,100,200],
'max_features':[1,3,5,7],
'min_samples_leaf':[1,2,3],
'min_samples_split':[1,2,3]
}
grid = GridSearchCV(RandomForestClassifier(),param_grid=param_grid,cv=5,scoring='accuracy',verbose=True,n_jobs=-1)
grid.fit(scaled_X_train,y_train)
From the error message it seems like some of the hyperparameter combinations could be leading to the error condition. Some of your fits run fine but a portion fail. Remove
1from the list of values formin_samples_split, as it has to be 2 or greater.If that doesn't resolve the error, add
error_score='raise'toGridSearchCV, so that when it encounters an error it will print the full stack trace.