How to specify a dummy model/heuristic rule as a model in tidymodels?

27 Views Asked by At

I'm comparing a few ML models on my dataset using tidymodels and workflowsets in R, and I want to compare them to a commonly used baseline heuristic rule in the domain as well at the same time.

I thought it might be simple to specify either the rule e.g. y_pred = (x1 > 3)|(x2 <1) as a model on the same data, tune nothing (as it won't change) and then compare easily using yardstick etc to all the other models as it's just a poorly fit model.

I cannot for the life of me figure out what is the right way to specify it cleanly at the start, the same as the models that actually get fit.

1

There are 1 best solutions below

0
Simon Couch On BEST ANSWER

The community-contrubuted parsnip extension package bespoke allows folks to define these sorts of models. Install with:

pak::pak("macmillancontentscience/bespoke")

The main function, bespoke(), takes a data frame as input and returns a vector (integer, character, or factor) indicating the outcomes as output (with one value per input row). A quick example of how that might look in action:

library(parsnip)
library(bespoke)

dat <- data.frame(
  y = factor(sample(c("a", "b"), 10, replace = TRUE)), 
  x1 = rnorm(10), 
  x2 = rnorm(10, .5)
)

make_pred <- function(x) {
  y_pred <- x$x1 > x$x2
  factor(y_pred, labels = c("a", "b"))
}

model_spec <- bespoke(fn = make_pred)

model_spec
#> bespoke Model Specification (classification)
#> 
#> Main Arguments:
#>   fn = make_pred
#> 
#> Computational engine: bespoke

model_fit <- model_spec %>% fit(y ~ x1 + x2, dat)

predict(model_fit, dat)
#> # A tibble: 10 × 1
#>    .pred_class
#>    <fct>      
#>  1 b          
#>  2 b          
#>  3 b          
#>  4 a          
#>  5 a          
#>  6 b          
#>  7 a          
#>  8 b          
#>  9 a          
#> 10 b

Created on 2024-03-20 with reprex v2.1.0

:)