How to use a custom scorer function in gridsearch

52 Views Asked by At

I am using lightgbm as a estimator in gridsearch. I want to change the score funtion in GridSearchCV, and this function calculates the threshold below the target false positive rate and passes the X_test and y_test parameters. i've tried a custom scorer

grid_parametaers = {
    'bagging_fraction':[0.8, 0.9],
    'bagging_freq': [2, 3],
    'min_data_in_leaf': [100,250]
}
gbm_model = lgb.LGBMClassifier(boosting_type = "gbdt",
                              objective = "binary",
                             metric = "l2,auc,binary", 
                             learning_rate = 0.1,
                             num_iterations = 5000,
                              )

def summary_of_threshold(estimator,X,y):
    fpr_target = 0.0006
    y_pred = estimator.predict(X_test)
    threshold, fpr = find_threshold(y_test.label, y_pred, fpr_target)
    print(y_pred[:10])
    print("#####################{}".format(fpr))
    return fpr
##X_test, y_test (global variables) are my test dataset. and the score funtion is to get the threshold at a target false positive rate

gsearch = GridSearchCV(gbm_model, param_grid=grid_parametaers, scoring=summary_of_threshold, cv=3, n_jobs = 1)
early_stopping_rounds = 300
gsearch.fit(
    X=X_train, 
    y=y_train.label,
    eval_set=[(X_val, y_val.label)],
    eval_metric=["l2","auc","binary"],
    callbacks=[early_stopping(early_stopping_rounds)],
    categorical_feature=categorical_features,
    verbose=10
)

The result is not as I imagined, the y_pred values predicted by the estimator are all 0. I print the first ten values enter image description here I want to know why the values predicted by the estimator are all 0

Before this I tried make_scorer factory,

customized_scoer = make_scorer(summary_of_threshold, greater_is_better=False)
gsearch = GridSearchCV(gbm_model, param_grid=grid_parametaers, scoring=customized_scoer, cv=3, n_jobs = 1)
early_stopping_rounds = 300
gsearch.fit(
    X=X_train, 
    y=y_train.label,
    eval_set=[(X_val, y_val.label)],
    eval_metric=["l2","auc","binary"],
    callbacks=[early_stopping(early_stopping_rounds)],
    categorical_feature=categorical_features,
    verbose=10
)

but encountered the following error

/root/anaconda3/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:821: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "/root/anaconda3/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 810, in _score
    scores = scorer(estimator, X_test, y_test)
  File "/root/anaconda3/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 266, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true, **_kwargs)
  File "/root/anaconda3/lib/python3.9/site-packages/sklearn/metrics/_scorer.py", line 355, in _score
    return self._sign * self._score_func(y_true, y_pred, **scoring_kwargs)
TypeError: summary_of_threshold() missing 1 required positional argument: 'y'
1

There are 1 best solutions below

2
Leo On

The problem is with your function definition:

def summary_of_threshold(estimator,X,y):

Scoring functions take two arguments: y_true and y_predicted. See for example this balanced_accuracy_function for reference: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.balanced_accuracy_score.html#sklearn.metrics.balanced_accuracy_score

So it should look more like this:

def summary_of_threshold(y_true, y_pred):
    # do stuff with y_true and y_pred

All sklearn scoring functions work like that. So you need to define your function such that it returns the evaluation metric based on the true, observed y and the predicted y and make it a scorer. Sklearn takes care of the rest. See also the very good user guide: https://scikit-learn.org/stable/modules/model_evaluation.html#scoring

Also DO NOT implement this with global variables as you do at the moment. All you need to do is something like this:

def summary_of_threshold(y_true, y_pred):
    fpr_target = 0.0006
    threshold, fpr = find_threshold(y_true, y_pred, fpr_target)
    return fpr

customized_scroer = make_scorer(summary_of_threshold, greater_is_better=False)