I want to use GridSearchCV to find the optimal n_neighbors parameter of KNeighborsClassifier
I want to use 'f1_score' metrics AND 'leave one out' strategy. But this code
clf = GridSearchCV(KNeighborsClassifier(), {'n_neighbors': [1, 2, 3]}, cv=LeaveOneOut(), scoring='f1')
clf.fit(x_train, y_train)
leads to an error
UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 due to no true nor predicted samples. Use `zero_division` parameter to control this behavior.
I want to compute f1 score not of each fold of cross validation (it is not possible to compute f1 score of the only one test example), but to compute f1 score based on the whole iteration set with n_neighbors = n.
Is it possible using GridSearchCV?
Not sure if this functionality is directly available in Scikit-Learn, but you can implement the following function to get the desired outcome.
In particular, we will make a dummy scorer which just returns the predicted class instead of computing any score using the ground-truth and the prediction. In this way we can access the predictions of each hyperparameters combination on the different examples in the LOO cv.
The problem with this approach is that certain results available in the
cv_results_dictionary (and in certain attributes ofGridSearchCV) won't have any meaning, but that probably is not a problem. We should just remember to putrefit=False, sinceGridSearchCVdoesn't have a way to determine the best model.Now we can access the predictions through
cv_results_and just usef1_scoreto compute the metric for each hyperparams configuration.The function prints the following output: