Excluding within-category interactions with step_interact()

31 Views Asked by At

I am having some trouble getting step_interact() from tidymodels to produce the desired set of predictor variables. I want to include pairwise interactions, but exclude all interactions which are within-category.

Suppose I have a dataset with the following categorical variables:

library(tidyverse)
library(tidymodels)

set.seed(12345)
dat <- tibble(outcome = sample(c("y", "n"), 100, replace = TRUE),
              race = sample(c("b", "w", "o"), 100, replace = TRUE),
              hisp = sample(c("hisp", "nhisp"), 100, replace = TRUE),
              cat = sample(c("a", "b", "c", "d", "e"), 100, replace = TRUE))

I then want to create a set of interactions which interacts all race categories with all hisp categories, all race categories with all cat categories, all hisp categories with all cat categories, and so on.

I tried the following recipe

rec <- recipe(outcome ~ ., data = dat) %>%
  step_dummy(all_nominal_predictors()) %>%
  step_interact(~ (starts_with("race_") + 
    starts_with("hisp_") + 
    starts_with("cat"))^2) %>%
  prep()

colnames(bake(rec, new_data = dat))

However, this produces predictors such as "cat_b_x_cat_e", "cat_c_x_cat_d", "race_o_x_race_w", and so on.

How can I omit these within category interactions but keep across-category ones (i.e., "race_w_x_cat_a", "race_w_x_hisp_nhisp", etc.)

0

There are 0 best solutions below