read_csv() read a date column that has more than one format

64 Views Asked by At

Suppose I have a csv file with more than one date format in a single column.

library(tidyverse)

df <- 
  r'(
    id,date
    1,2023-01-01
    2,1/1/2023
  )'

If I use read_csv() to get this data into R, I can pass it a column specification to make sure the columns are read as the correct types. However, if the dates aren't formatted consistently, read_csv() isn't able to read at least one of them.

col_spec <-
  cols(
    id = col_character(),
    date = col_date()
  )

read_csv(df, col_types = col_spec)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> # A tibble: 2 x 2
#>   id    date      
#>   <chr> <date>    
#> 1 1     2023-01-01
#> 2 2     NA

And if I specify a different date format, the other one isn't parsed properly.

col_spec <-
  cols(
    id = col_character(),
    date = col_date(format = '%d/%m/%Y')
  )

read_csv(df, col_types = col_spec)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> # A tibble: 2 x 2
#>   id    date      
#>   <chr> <date>    
#> 1 1     NA        
#> 2 2     2023-01-01

I know I can read everything as a string and reformat it after reading it, but is there a way to pass multiple date formats into col_date()? I know that lubridate::parse_date_time() has an orders argument that allows multiple formats to be specified. Is there some way to do that using readr?

1

There are 1 best solutions below

1
Grzegorz Sapijaszko On

Use {anytime} package. And probably don't specify column types, or specify this covering date as a char().

df <- data.frame(
  id = c(1, 2),
  date = c("2023-01-01", "1/1/2023")
)

df |>
  dplyr::mutate(date = anytime::anydate(date))
#>   id       date
#> 1  1 2023-01-01
#> 2  2 2023-01-01

Created on 2024-01-24 with reprex v2.1.0