I have a dataset that looks like this, where "1" represents if a host is infected and "0" represents if a host is uninfected at that specified dose. However, the ROC function needs observed data, false positive and true positives to generate the ROC curve. I think that that I am missing a step or miscalculated something but I'm not sure what it is.
library(pROC)
dataname <- data.frame(Dose = c(rep(0.2, 8), rep(0.3, 7), rep(0.7, 10)),
Infected = c(rep(0, 20), rep(1, 5)))
I used GLM to get the probability of each host getting infected at each dose size.
#logistic model
logistic <- glm(
formula = Infected ~ Dose,
data = dataname,
family = binomial(link = 'logit')
)
I then ordered the probabilities from lowest to highest and ranked them:
predicted.data<-data.frame(prob.inf = logistic$fitted.values, Infected = dataname$Infected)
predicted.data<-predicted.data[order(predicted.data$prob.inf, decreasing=FALSE),]
predicted.data$rank<-1:nrow(predicted.data)
I then ran the roc function and plotted the curve:
roc_data <-roc(dataname$Infected, predicted.data$prob.inf)
plot(roc_data, main="ROC Curve", print.auc=TRUE, xlim=(0:1), ylim=(0:1))


You do not need to order and rank the predicted probabilites. Assuming you are using the
roc()function from thepROCpackage you can simply feed it your responsedataname$Infectedand your fitted valueslogistic$fitted.values.The following code:
produces:
Which seems correct to me.