How to adapt lgb.cv in my k-folds splitting way?

16 Views Asked by At

I design a method to split the data into 5 folds, then I want to use it to perform 5-folds cross-validation.

from load_data import load_data
folds, test_samples, input_shape = load_data()
folds[0].keys()
# dict_keys(['train', 'val', 'test'])

To use that specific 5 folds to optimize GBM model, for any optimization method (e.g., RandomSearch, GridSearch,...), I need to train 5 models each hyper-parameter configuration and then evaluate model performance.

A way to do that, I iterate each fold to train a model using

early_stopping = lgb.early_stopping(stopping_rounds=10)
model = lgb.LGBMClassifier()
model.fit(X, y, callbacks=[early_stopping],...)

Another way I found it is lgb.cv, which does not allow my folds to fit in.

Does anyone have idea how to implement lgb.cv without using its splitting?

This is a snippet code for 1 configuration

from sklearn.metrics import roc_auc_score
from sklearn.metrics import accuracy_score
from timeit import default_timer as timer

for i, fold in enumerate(folds):
    print('Fold', i+1)
    train, val, test = folds[fold].values()
    early_stopping = lgb.early_stopping(stopping_rounds=10)
    model = lgb.LGBMClassifier()
    
    start = timer()
    model.fit(train['x'], train['y'], 
              callbacks=[early_stopping], 
              eval_set=[
                  (train['x'], train['y']),
                  (val['x'], val['y']), 
                  (test['x'], test['y'])],
              eval_names=['train', 'val', 'test'],
              eval_metric=['auc', 'binary_logloss'], 
              feature_name=feat_names)
    train_time = timer() - start

    # Make predictions
    predictions = model.predict_proba(val['x'])
    auc = roc_auc_score(val['y'], predictions[:, 1])
    acc = accuracy_score(val['y'], np.argmax(predictions, axis=1))
    
    print('The validation accuracy on the validation set is {:.4f}.'.format(acc))
    print('The validation auc on the validation set is {:.4f}.'.format(auc))
    print('The training time is {:.4f} seconds'.format(train_time))

How could I adapt it in the way of optimization (e.g., RandomSearch)?

0

There are 0 best solutions below