I am trying to model a proportion (or probability), i.e. a continuous variable bounded between 0 and 1 (inclusive) using caret.
In my previous post, I've been told to use a logistic regression (i.e. glm() with family = "binomial).
I would like to use the caret framework with k-fold cross validation.
However, caret doesn't work with the code below, although I have no problem using stats::glm() on its own. The data below is purely for reproducibility.
library(caret)
#> Loading required package: ggplot2
#> Loading required package: lattice
data("USArrests")
#New Data Points with 0 and 100 % urban population
newstate1 <- matrix(c(7, 100, 0, 10),
nrow = 1,
dimnames =
list("New State 1", colnames(USArrests)))
newstate2 <- matrix(c(7, 100, 100, 10),
nrow = 1,
dimnames =
list("New State 2", colnames(USArrests)))
USArrests <- rbind(USArrests, newstate1, newstate2)
# As a proportion
USArrests$UrbanPop <- USArrests$UrbanPop /100
ctrl <- trainControl(
method = "cv",
number = 5,
classProbs = TRUE
)
train(
formula = UrbanPop ~ .,
data = USArrests,
method = "glm",
family = "binomial",
trControl = ctrl,
metric = "ROC"
)
#> Warning in train.default(x, y, weights = w, ...): cannnot compute class
#> probabilities for regression
#> Error in evalSummaryFunction(y, wts = weights, ctrl = trControl, lev = classLevels, : train()'s use of ROC codes requires class probabilities. See the classProbs option of trainControl()
#the below works:
glm(
formula = UrbanPop ~ .,
data = USArrests,
family = "binomial")
Created on 2024-01-30 with reprex v2.0.2