I have a data frame with a large number of variables, one of them, the probability of death to be predicted by all others. As a preliminary step I want to compute the PoD by counting the death rate in bins of each variable.
let's say df <- (age = c(25, 57, 60), weight = (80, 92, 61), cigarettes_a_day = c(30, 2, 19), death_flag=c(1,0,1))
Then I can group by age (say under 50 and over 50) and compute the PoD as the death rate of one group as the count of death_flags divided by the number of people falling into the group, or simply the average death_flag. When grouping by weight(say below and above 80) I will obtain a different death rate and thus a different PoD, for each binned variable, which is what I want. My problem arises when trying to iterate through all variables.
So far I've tried variants of the following piece of code, which however does not work:
for(n in names(df)) {
df%>% group_by(n)%>%
summarise(PoD_bin = mean(death_flag))
}
I haven't figured out a way to run through all variables and perform the computation.
As a side note, the binning of variables I have done without dplyr by:
for(v in names(df[-1])){
newVar <- paste(f, "bin", sep = "_")
df[newVar] <- cut(as.matrix(df[v]), breaks = 100)
}
I am irritated, that I cannot refer to the variables in the first for loop for the grouping, while I can do so in the second to create new columns of the df.
Help is greatly appreciated!
Your loop doesn't work because a character is parsed to
group_by. You could modify your loop a little bit and get the desired result. I have addedprint()to see the output.Output:
Data: