What is the purpose of standardization after one-hot encoding?

457 Views Asked by At

I was reading a tutorial for tidymodels and came across the following code block:

lr_recipe <- 
  recipe(children ~ ., data = hotel_other) %>% 
  step_date(arrival_date) %>% 
  step_holiday(arrival_date, holidays = holidays) %>% 
  step_rm(arrival_date) %>% 
  step_dummy(all_nominal_predictors()) %>% 
  step_zv(all_predictors()) %>% 
  step_normalize(all_predictors())

( This is the source of the code: https://www.tidymodels.org/start/case-study/#first-model )

Basically, the code lists a set of pre-processing operations on predictors that are stored in a recipe object. Now, my question arises from the following: first, in step_dummy(all_nominal_predictors()) one-hot encoding is performed on categorical predictors. Then, in a following step, step_normalize(all_predictors()) applies centering and scaling to all predictors (therefore also on the encoded categorical ones. I am used to train models directly with one-hot encoded categorical predictors, without further processing them through a normalizing step. What is the advantage of normalizing one-hot encoded predictors? Also, how does it affect the interpretability of the model when predictions are done? Thanks for any clarification.

1

There are 1 best solutions below

1
topepo On

If the binary variables are the only predictors, the set of predictors is already standardized (to be on the same units/scale) so no need to do anything else.