Error when using step_dummy() to handle categorical variables with tidymodels, recipe, bayesian

93 Views Asked by At

I am trying to use the bayesian package to build a tidy bayesian model. I am mostly following the bayesian's get started vignette, although my specific use-case differs slightly in that I am trying to model a numerical variable as a function of a categorical variable. Based on the vignette, I would expect the following code to work.

#Load libraries
library(tidyverse)
library(ggplot2)
library(recipes)
library(bayesian)
library(workflows)

#Create recipe
rec_obj <- recipe(Sepal.Length ~ ., data = iris %>%
                      select(-Sepal.Width, -Petal.Length, -Petal.Width)) %>%
  #Variable transformation
  step_dummy(Species) %>%
  step_log(Sepal.Length, base = 10)

#Create model
model_obj <- bayesian(
  family = gaussian(),
) |>
  set_engine("brms") |>
  set_mode("regression")

#Create workflow
workflow_obj <- workflow() %>%
  add_recipe(rec_obj) %>%
  add_model(
    spec = model_obj,
    formula = Sepal.Length ~ Species
  )

#Fit model 
#(this step does not run)
model_fit <- workflow_obj %>%
  fit(data = iris)

I am able to create the objects for the recipe, model, and workflow. However, when I try to fit the model, I get the following error:

Error: The following variables can neither be found in 'data' nor in 'data2':
'Species'
1

There are 1 best solutions below

3
joran On

In your workflow change the formula to formula = Sepal.Length ~ ..

After step_dummy you've transformed the data so that the column Species is replaced by dummy columns with different names.