I want to create an ensemble model with "user defined weights".
If I create multiple submodels using tidymodels, I want to produce a final model that puts equal weight on each submodel. The package stacks is great for producing more optimal weights... but sometimes I just want to put equal weight on each submodel. Also... stacks is great because I can then use the "stacked" model object with the DALEXtra package to help explain the final ensemble model.
Here is an example of something I'm doing.
## load in packages
library(tidymodels)
library(stacks)
library(DALEXtra)
# get a sample of the ames dataset
set.seed(1)
df <- ames %>%
sample_n(500)
# some setup: resampling and a basic recipe
set.seed(1)
df_splits <- initial_split(df)
df_train <- training(df_splits)
df_test <- testing(df_splits)
set.seed(1)
df_folds <- vfold_cv(df_train, v = 4)
rec_small <- recipe(Sale_Price ~ Gr_Liv_Area, data = df)
rec_big <- recipe(Sale_Price ~ BsmtFin_SF_1 + First_Flr_SF + Second_Flr_SF, data = df)
# setting up my one model type
rand_forest_ranger_spec <-
rand_forest() %>%
set_engine('ranger') %>%
set_mode('regression')
# setting up my one workflow set of my two recipes and one model type
wf_rfs <-
workflow_set(
preproc = list(rec_small,
rec_big),
models = list(rf = rand_forest_ranger_spec)
)
# estimating my two random forest models
grid_ctrl <-
control_grid(
save_pred = TRUE,
parallel_over = "everything",
save_workflow = TRUE
)
grid_results <-
wf_rfs %>%
workflow_map(
seed = 1503,
resamples = df_folds,
control = grid_ctrl
)
# setting up our stacking
stacks()
df_st <-
stacks() %>%
add_candidates(grid_results)
set.seed(1)
df_model_st <-
df_st %>%
blend_predictions()
# looking at final estimated model
df_model_st$equations$numeric
#### i got
#### -42148.1667470673 + (recipe_1_rf_1_1 * 0.13109783287876) + (recipe_2_rf_1_1 * 1.08833216052151)
#### but what want something like user defined values
#### 0 + (rec_simple_rf_1_1 * .5) + (rec_big_rf_1_1 * .5)
I could go on with this stacks model, and use DALEXtra to help explain this stacks ensemble model with some global model explainations... Kinda like this...
# Fit an ensemble model using that stacks
df_model_st_fitted <-
df_model_st %>%
fit_members()
# I want to be able to use the cool DALEX tools to explain a user-defined weighted ensemble model
vip_features <- c("Gr_Liv_Area", "BsmtFin_SF_1", "First_Flr_SF", "Second_Flr_SF")
vip_train <-
df %>%
select(all_of(vip_features))
# Setting up the explainer
explainer_blended_rf <-
explain_tidymodels(
df_model_st_fitted,
data = vip_train,
y = df$Sale_Price,
label = "Blended Random Forest",
verbose = FALSE
)
# using the explainer to produce a VIP
vip_example <-
explain_tidymodels(
df_model_st_fitted,
data = vip_train,
y = df$Sale_Price,
label = "Blended RF",
verbose = FALSE
) %>%
model_parts()
plot(vip_example)
#using the explainer to produce AL plots
al_rf <- model_profile(explainer = explainer_blended_rf,
type = "accumulated",
variables = names(vip_train)
)
plot(al_rf) +
ggtitle("Accumulated-local profiles")
In sum... I love stacks and it's ability to both create weights, and creates a model object that can be used later as a tidymodel. But, I don't want the weights created by stacks, I want to create my own weights. I don't know if I should be doing something within stacks to create the weights I want. Or... if I should not be bothering with stacks at all, because I already know the weights I want. But... I don't know how to create an ensemble model like stacks does, to use later like a tidymodel.
One approach here is to manually get the predictions for each model and get a vector calculating the mean of each prediction values stored in a list column on your results tibble.
Something like this:
To get importances of your stack, in vip package you can use custom wrappers.