I trained an XGBClassifer on a few hundred samples (around 90% of the samples belong to one class), and there is the same distribution (9:1) for train and test sets, but the model is overfitting:
# Scale data
X_train_scaled, X_test_scaled = scale_data(X_train, X_test)
# Eval metrics and early stopping
XGBC = XGBClassifier(random_state = 42, eval_metric = ['auc', 'logloss'],
early_stopping_rounds = 5)
# A parameter grid for XGBoost
params = {
'learning_rate': [ 0.1, 0.02, 0.03, 0.04],
'n_estimators': [300, 400, 500],
'max_depth': [2, 3, 6, 8],
'random_state': [42],
'scale_pos_weight': [9.0], # total negative/positive classes
'lambda': [1, 5, 7],
'alpha':[0, 5, 7],
'max_delta_step': [0, 1, 3, 5],
'gamma': [0, 2, 4, 5 ],
'subsample': [0.7, 0.8, 0.9],
'colsample_bytree':[0.4, 0.5],
'colsample_bylevel':[0.4, 0.5],
'colsample_bynode':[0.4, 0.5],
'min_child_weight': [3, 5, 7]
}
# Parameters to fit
fit_parms = {'eval_set': [(X_train_scaled, y_train), (X_test_scaled, y_test)],
'verbose': False
}
# Stratified cv
skf = StratifiedKFold(5)
# Call grid search cv
grid = GridSearchCV(
estimator = XGBC,
param_grid =params,
scoring = 'roc_auc',
n_jobs = 32,
cv = skf.split(X_train_scaled, y_train),
verbose = 1,
refit=True
)
# Fit model
grid.fit(X_train_scaled, y_train, **fit_parms)
# Get the best estimator
best_model = grid.best_estimator_
# Get the evaluation results
best_evals_result = best_model.evals_result()
metrics = ['auc', 'logloss']
ylabs = ['AUC', 'log loss']
titles = ['XGBoost AUC', 'XGBoost log loss']
plt.figure(figsize=(8,3), dpi= 300)
for i in range(len(metrics)):
plt.subplot(1, 2, i+1)
basic_plot(yvals=best_evals_result, metric = metrics[i],
xlab = 'Iterations', ylab = ylabs[i], title = titles[i])
plt.tight_layout()
I tried to control the parameters for overfitting, such as max_depth, scale_pos_weight, max_delta_step, and colsample_bytree, but it is not working. Here are the best parameters from the above search. Any ideas will be very helpful.
{'alpha': 5,
'colsample_bylevel': 0.4,
'colsample_bynode': 0.4,
'colsample_bytree': 0.5,
'gamma': 2,
'lambda': 5,
'learning_rate': 0.1,
'max_delta_step': 0,
'max_depth': 3,
'min_child_weight': 5,
'n_estimators': 300,
'random_state': 42,
'scale_pos_weight': 9.0,
'subsample': 0.8}
