I am having some trouble getting step_interact() from tidymodels to produce the desired set of predictor variables. I want to include pairwise interactions, but exclude all interactions which are within-category.
Suppose I have a dataset with the following categorical variables:
library(tidyverse)
library(tidymodels)
set.seed(12345)
dat <- tibble(outcome = sample(c("y", "n"), 100, replace = TRUE),
race = sample(c("b", "w", "o"), 100, replace = TRUE),
hisp = sample(c("hisp", "nhisp"), 100, replace = TRUE),
cat = sample(c("a", "b", "c", "d", "e"), 100, replace = TRUE))
I then want to create a set of interactions which interacts all race categories with all hisp categories, all race categories with all cat categories, all hisp categories with all cat categories, and so on.
I tried the following recipe
rec <- recipe(outcome ~ ., data = dat) %>%
step_dummy(all_nominal_predictors()) %>%
step_interact(~ (starts_with("race_") +
starts_with("hisp_") +
starts_with("cat"))^2) %>%
prep()
colnames(bake(rec, new_data = dat))
However, this produces predictors such as "cat_b_x_cat_e", "cat_c_x_cat_d", "race_o_x_race_w", and so on.
How can I omit these within category interactions but keep across-category ones (i.e., "race_w_x_cat_a", "race_w_x_hisp_nhisp", etc.)