How to model a proportion using caret in R?

40 Views Asked by At

I am trying to model a proportion (or probability), i.e. a continuous variable bounded between 0 and 1 (inclusive) using caret.

In my previous post, I've been told to use a logistic regression (i.e. glm() with family = "binomial).

I would like to use the caret framework with k-fold cross validation.

However, caret doesn't work with the code below, although I have no problem using stats::glm() on its own. The data below is purely for reproducibility.

library(caret)
#> Loading required package: ggplot2
#> Loading required package: lattice

data("USArrests")

#New Data Points with 0 and 100 % urban population
newstate1 <- matrix(c(7, 100, 0, 10),
                    nrow = 1,
                    dimnames =
                      list("New State 1", colnames(USArrests)))

newstate2 <- matrix(c(7, 100, 100, 10),
                    nrow = 1,
                    dimnames =
                      list("New State 2", colnames(USArrests)))

USArrests <- rbind(USArrests, newstate1, newstate2)

# As a proportion
USArrests$UrbanPop <- USArrests$UrbanPop /100


ctrl <- trainControl(
  method = "cv", 
  number = 5,
  classProbs = TRUE
)

train(
  formula = UrbanPop ~ .,
  data = USArrests,
  method = "glm",
  family = "binomial",
  trControl = ctrl,
  metric = "ROC"
)
#> Warning in train.default(x, y, weights = w, ...): cannnot compute class
#> probabilities for regression
#> Error in evalSummaryFunction(y, wts = weights, ctrl = trControl, lev = classLevels, : train()'s use of ROC codes requires class probabilities. See the classProbs option of trainControl()

#the below works:
glm(
  formula = UrbanPop ~ .,
  data = USArrests,
  family = "binomial")

Created on 2024-01-30 with reprex v2.0.2

0

There are 0 best solutions below