Constructing binary classification model with xgboost in R with strange result

22 Views Asked by owl_sleeps_at_night At 11 March 2024 at 16:20

I'm training a binary classification model with xgboost in R using xgboost package. I use R package ParBayesianOptimization to tune some hyperparameters. Here is my code.

# dtrain is my input. It's a matrix with 247 features and 423 samples in total (373 positive sample with label 1 and 50 negative sample with label 0)

scoring_function <- function(eta, gamma, max_depth,colsample_bytree,alpha,nfold) {
  

  
  pars <- list(
    eta = eta,
    gamma = gamma,
    max_depth = max_depth,
    # min_child_weight = min_child_weight,
    colsample_bytree = colsample_bytree,
    alpha = alpha,
    # subsample = subsample,
    objective = "binary:logistic",
    eval_metric = "auc",
    verbosity = 0
  )

  xgbcv <- xgb.cv(
    params = pars,
    data = dtrain,
    nfold = nfold,
    nrounds = 100,
    prediction = TRUE,
    showsd = TRUE,
    early_stopping_rounds = 10,
    maximize = TRUE,
    stratified = TRUE
  )
  
 
  return(
    list(
      Score = max(xgbcv$evaluation_log$test_auc_mean),
      nrounds = min(xgbcv$best_iteration)
    )
  )
}
# set boundary for parameters
bounds <- list(
  eta = c(0.1, 0.5),
  gamma =c(0, 1.9),
  max_depth = c(3L, 10L), 
  alpha = c(0,0.7),
  colsample_bytree = c(0,0.4),
  nfold = c(3L, 5L)
)
# Perform the search
library(ParBayesianOptimization)
set.seed(123)
time_noparallel <- system.time(
  opt_obj <- bayesOpt(
    FUN = scoring_function,
    bounds = bounds,
    initPoints = 7,
    iters.n = 8
  ))
besttune=getBestPars(opt_obj)

#use the best parameters to construct model
params_1=list(
  eta=besttune$eta, 
  gamma =besttune$gamma, 
  max_depth = besttune$max_depth, 
  objective = "binary:logistic",
  eval_metric = "auc", 
  colsample_bytree = besttune$colsample_bytree
  # min_child_weight = besttune$min_child_weight,
  # #alpha = besttune$alpha,
  # subsample=besttune$subsample
  )
nrounds=max(na.omit(opt_obj$scoreSummary$nrounds)[
  (which(na.omit(opt_obj$scoreSummary$Score)
        == max(na.omit(opt_obj$scoreSummary$Score))))])

set.seed(123)
xgb2 <- xgb.train(params = params_1,
                  data = dtrain,
                  nrounds = nrounds,
                  watchlist = list(val=dtest,train=dtrain),
                  print_every_n = 10,
                  maximize = F
)

But when I check the feature importance matrix of the model xgb2 I found that it only contains 20 features, yet my classmate who use random forest picks up more than 30 features in the end. So is the anything wrong about my procedure? If I want to make my model pick similar features as that in the random forest model, how can I tune the hyperparameters? I want to use ParBayesianOptimization package if possible.

I want my xgboost model contains features similar to those in random forest model using the same dataset.

Original Q&A

Constructing binary classification model with xgboost in R with strange result

There are 0 best solutions below

Related Questions in R

Related Questions in PERFORMANCE

Related Questions in XGBOOST

Related Questions in HYPERPARAMETERS

Trending Questions

Popular # Hahtags

Popular Questions