I am trying to create a function that will add columns that order multiple columns and allocate data into four groups with equal numbers of observations (these will constitute four risk levels). I understand that where there are multiple zeroes this may not be possible, but would like this to work otherwise.
I need to be able to do this with mutate_at as I have 200 different score columns
Sample dataset below. I have used the cut function, the cut function produces groups of unequal sizes even where there are not multiple 0 values
library(tidyverse)
dat <- data.frame(Suburb = c("Werribee", "Hoppers", "Carlton", "North", "Serrano", "Upwell","Market", "Poynton", "Stawell"),
scores_1 = c(0,0.0001, 230, 340, 340, 10, 3490, 5,21),
scores_2 = c(12, 0,0,0,0,0,0, 45,13),
scores_3 = c(4,5,6,7,8,9,10, 45, 45))
allscores <- c("scores_1", "scores_2", "scores_3")
myresult <- dat %>%
mutate_at(.vars = allscores,
.funs = list(risk_levels = ~ as.integer(cut(as.numeric(.), breaks = 4))))
I have also tried to use cut_numberfunction but have gotten the following error:
risk.data2 <- dat %>%
mutate_at(.vars = allscores,
.funs = list(risk_levels = ~ as.integer(cut_number(as.numeric(.), n = 4))))
Error in `mutate_cols()`:
! Problem with `mutate()` column `scores_2_risk_levels`.
i `scores_2_risk_levels = (structure(function (..., .x = ..1, .y = ..2, . = ..1) ...`.
x Insufficient data values to produce 4 bins.
Caused by error in `cut_number()`:
! Insufficient data values to produce 4 bins.
How can I use tidyverse to order and then group my many score values into four roughly equal size groups?
You can convert each column into ranks before
cut().