Optimizing parallelization in R using Tidymodels, Ranger and Workflowsets - slow

201 Views Asked by At

I am trying to run 35 different models (each with cvfold = 5 and grids of size 4 for tuning - so in essense actually 700 models) using Tidymodels' implementation of Ranger.

I am using workflowsets to create the 35 different specifications. Then I am using tune_race_anova to tune mtry and min_n.

However, I find that the results are very slow. First, I was running with the following settings:

cl <- makePSOCKcluster(parallel::detectCores(logical = FALSE))
registerDoParallel(cl, cores = cl-1)

race_ctrl <-
   control_race(
      save_pred = F,
      parallel_over = "everything",
      save_workflow = F,
      burn_in = 2
   )

start_time <- Sys.time()
race_results <-
   rf_workflows %>%
   workflow_map(
      "tune_race_anova",
      seed = 1503,
      resamples = train_folds,
      grid = 4,
      control = race_ctrl
   )

end_time <- Sys.time()

This resulted in very few cores being used at a time, and the CPU not really maxxing out. Then if I add the num.threads argument to set_engine (I am giving it 18), it is still not using that many cores but CPU is maxxing out at 100%.

I am running on a server so parallel:detectcores() returns 32. The server has 264 GB ram. However, it is not using all the RAM, because it is allocating it to cores (but not utilizing those cores).

My resampling lookslike this:

train_folds  <- 
   rsample::vfold_cv(data_train, strata = ON, v= 5, repeats = 2)

How can I speed things up and make sure parallelization is utilizing as many cores as possible? It is only utilizing 8 cores (100% CPU) and 30 % of my ram. There are 24 cores using 0% CPU but each having around 1GB of RAM allocated.

Could it not somehow parallel over each model in the workflowsets?

1

There are 1 best solutions below

2
topepo On

I think the issue is with

registerDoParallel(cl, cores = cl-1)

I'm surprised that cl - 1 doesn't produce an error:

> cl <- makePSOCKcluster(parallel::detectCores(logical = FALSE))
> cl - 1
Error in cl - 1 : non-numeric argument to binary operator

Regardless, here is how parallelism works with racing:

For the initial "burn in" phase, you can parallelize 5 * 2 * 2 * num_configs tasks.

After that, the selection phase will be able to parallelize 5 * 2 * num_remaining_configs.