I have survey data of minutes to complete a journey in a dataframe, of character type. Some entries are written as a range, e.g. '5-10'. I want to change these entries to the mean of the range.
My data looks like this.
[1] "30" "15"
[3] "30" "15 Minutes "
[5] "15" "20 mins "
[7] "30" "half an hour to 40 minutes"
[9] "30" "40"
[11] "20" "30"
[13] "15" "20"
[15] "40" "20"
[17] "40" "30"
[19] "15" "15"
[21] "20" "30mins"
[23] "20" "20"
[25] "15" "40"
[27] "15" "25"
[29] "30" "20"
[31] "Depends where you live" "30-45"
[33] "30 min " "20"
[35] "30" "20"
[37] "60 minutes" "30 mins"
[39] "15" "10"
[41] "20" "40"
[43] "60" "60"
[45] "30" "49"
[47] "50 minutes" "20 minutes"
[49] "90" "7-10 minutes "
[51] "15-20" "25 minutes"
[53] "25" "45"
[55] "60 minutes " "2-4 hours"
[57] "30" "30 min"
[59] "20" "30"
[61] "20" "25"
[63] "2-4hrs" "30"
[65] "45" "45"
[67] "75" "20"
[69] "60" "45mins"
[71] "60" "20"
I have tried the following code:
data <- data %>% mutate(
est_time = case_when(
grepl('-', est_time) ~ mean(as.numeric(unlist(str_split(est_time, '-'))))
))
data <- data %>% mutate(
est_time = ifelse(
grepl('-', est_time),
mean(as.numeric(unlist(str_split(est_time, '-')))),
est_time)
)
Each time, I recieve:
Warning message:
Problem while computing `est_time = case_when(...)`.
ℹ NAs introduced by coercion
I suspect this may be because the unlist function spreads the list data over multiple rows.
How can I resolve this and achieve my aim?
Instead of using
case_whenorifelse, an option is to select only the rows having-, read withread.table, get therowMeansand assign it backIn case, we want to take the
meanof all those entries having the-(i.e. including all those entries like2-4 hoursor7-10 minutesin addition to15-20)-output
data