Suppose I have a csv file with more than one date format in a single column.
library(tidyverse)
df <-
r'(
id,date
1,2023-01-01
2,1/1/2023
)'
If I use read_csv() to get this data into R, I can pass it a column specification to make sure the columns are read as the correct types. However, if the dates aren't formatted consistently, read_csv() isn't able to read at least one of them.
col_spec <-
cols(
id = col_character(),
date = col_date()
)
read_csv(df, col_types = col_spec)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#> dat <- vroom(...)
#> problems(dat)
#> # A tibble: 2 x 2
#> id date
#> <chr> <date>
#> 1 1 2023-01-01
#> 2 2 NA
And if I specify a different date format, the other one isn't parsed properly.
col_spec <-
cols(
id = col_character(),
date = col_date(format = '%d/%m/%Y')
)
read_csv(df, col_types = col_spec)
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#> dat <- vroom(...)
#> problems(dat)
#> # A tibble: 2 x 2
#> id date
#> <chr> <date>
#> 1 1 NA
#> 2 2 2023-01-01
I know I can read everything as a string and reformat it after reading it, but is there a way to pass multiple date formats into col_date()? I know that lubridate::parse_date_time() has an orders argument that allows multiple formats to be specified. Is there some way to do that using readr?
Use {anytime} package. And probably don't specify column types, or specify this covering date as a
char().Created on 2024-01-24 with reprex v2.1.0