I am trying to use the bayesian package to build a tidy bayesian model. I am mostly following the bayesian's get started vignette, although my specific use-case differs slightly in that I am trying to model a numerical variable as a function of a categorical variable. Based on the vignette, I would expect the following code to work.
#Load libraries
library(tidyverse)
library(ggplot2)
library(recipes)
library(bayesian)
library(workflows)
#Create recipe
rec_obj <- recipe(Sepal.Length ~ ., data = iris %>%
select(-Sepal.Width, -Petal.Length, -Petal.Width)) %>%
#Variable transformation
step_dummy(Species) %>%
step_log(Sepal.Length, base = 10)
#Create model
model_obj <- bayesian(
family = gaussian(),
) |>
set_engine("brms") |>
set_mode("regression")
#Create workflow
workflow_obj <- workflow() %>%
add_recipe(rec_obj) %>%
add_model(
spec = model_obj,
formula = Sepal.Length ~ Species
)
#Fit model
#(this step does not run)
model_fit <- workflow_obj %>%
fit(data = iris)
I am able to create the objects for the recipe, model, and workflow. However, when I try to fit the model, I get the following error:
Error: The following variables can neither be found in 'data' nor in 'data2':
'Species'
In your workflow change the formula to
formula = Sepal.Length ~ ..After
step_dummyyou've transformed the data so that the column Species is replaced by dummy columns with different names.