R: Change '5-10' character to 7.5 numeric in dataframe row, NAs introduced by coercion

Question

R: Change '5-10' character to 7.5 numeric in dataframe row, NAs introduced by coercion

44 Views Asked by Fionn McGlacken At 26 November 2022 at 17:05

I have survey data of minutes to complete a journey in a dataframe, of character type. Some entries are written as a range, e.g. '5-10'. I want to change these entries to the mean of the range.

My data looks like this.

 [1] "30"                         "15"                        
 [3] "30"                         "15 Minutes "               
 [5] "15"                         "20 mins "                  
 [7] "30"                         "half an hour to 40 minutes"
 [9] "30"                         "40"                        
[11] "20"                         "30"                        
[13] "15"                         "20"                        
[15] "40"                         "20"                        
[17] "40"                         "30"                        
[19] "15"                         "15"                        
[21] "20"                         "30mins"                    
[23] "20"                         "20"                        
[25] "15"                         "40"                        
[27] "15"                         "25"                        
[29] "30"                         "20"                        
[31] "Depends where you live"     "30-45"                     
[33] "30 min "                    "20"                        
[35] "30"                         "20"                        
[37] "60 minutes"                 "30 mins"                   
[39] "15"                         "10"                        
[41] "20"                         "40"                        
[43] "60"                         "60"                        
[45] "30"                         "49"                        
[47] "50 minutes"                 "20 minutes"                
[49] "90"                         "7-10 minutes "             
[51] "15-20"                      "25 minutes"                
[53] "25"                         "45"                        
[55] "60 minutes "                "2-4 hours"                 
[57] "30"                         "30 min"                    
[59] "20"                         "30"                        
[61] "20"                         "25"                        
[63] "2-4hrs"                     "30"                        
[65] "45"                         "45"                        
[67] "75"                         "20"                        
[69] "60"                         "45mins"                    
[71] "60"                         "20"

I have tried the following code:

data <- data %>% mutate(
  est_time = case_when(
    grepl('-', est_time) ~  mean(as.numeric(unlist(str_split(est_time, '-'))))
))

data <- data %>% mutate(
  est_time = ifelse(
    grepl('-', est_time),
    mean(as.numeric(unlist(str_split(est_time, '-')))),
    est_time)
  )

Each time, I recieve:

Warning message:
Problem while computing `est_time = case_when(...)`.
ℹ NAs introduced by coercion

I suspect this may be because the unlist function spreads the list data over multiple rows.

How can I resolve this and achieve my aim?

Original Q&A

There are 1 best solutions below

**akrun** · Accepted Answer · 2022-11-26T17:12:19.663000

Instead of using case_when or ifelse, an option is to select only the rows having -, read with read.table, get the rowMeans and assign it back

i1 <- grepl('^(\\d+)-(\\d+)$', data$est_time)
data$est_time[i1] <- rowMeans(read.table(text = data$est_time[i1], 
     sep = '-', header = FALSE), na.rm = TRUE)

In case, we want to take the mean of all those entries having the - (i.e. including all those entries like 2-4 hours or 7-10 minutes in addition to 15-20)

library(stringr)
library(dplyr)
data %>%
   mutate(est_time2 = str_replace_all(est_time, "(\\d+-\\d+)", 
   function(x)  mean(scan(text = x, what = numeric(), 
        sep = '-', quiet = TRUE))))

-output

# A tibble: 9 × 2
  est_time                 est_time2               
  <chr>                    <chr>                   
1 "15 Minutes"             "15 Minutes"            
2 "20 mins"                "20 mins"               
3 "40"                     "40"                    
4 "15"                     "15"                    
5 "Depends where you live" "Depends where you live"
6 "7-10 minutes "          "8.5 minutes "          
7 "15-20"                  "17.5"                  
8 "2-4 hours"              "3 hours"               
9 "30 min"                 "30 min"

data

data <- structure(list(est_time = c("15 Minutes", "20 mins", "40", "15", 
"Depends where you live", "7-10 minutes ", "15-20", "2-4 hours", 
"30 min")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-9L))

R: Change '5-10' character to 7.5 numeric in dataframe row, NAs introduced by coercion

There are 1 best solutions below

data

Related Questions in R

Related Questions in DATAFRAME

Related Questions in DATA-CLEANING

Related Questions in SURVEY

Related Questions in COERCION

Trending Questions

Popular # Hahtags

Popular Questions