I am using readr::read_csv to import a series of files, then updating with data through and API. read_csv generally does a good job of guessing column type, but seems to default to logical if there isn't data in the first 1000 rows of the file. If I was always using the same file/ knew which columns the specific file had, I could specify the column type (i.e. using col_spec= cols( sea_level_pressure_set_1d = col_double(),...) but since there are multiple files, they don't all have all the columns.
Specifically, it seems that read_csv defaults to logical, which leads to parsing failures.
Is there a way to force read_csv to follow a hierarchy of column types; limit its options to, say just character, double and datetime; or use a .default setting with unknown exceptions? using the .default arguement, it seems that I need to specify all the exceptions, and my problems arise when a file has a different format for an exception.
I would like read_csv to only assign datettime, numeric, and character columns.
I think I found your solution but may need more details from you to confirm:
First step: You first want to get a col() output for each of your files which you can do manually or use
spec_col()formula to automatically extract that for you (which I show below).This produces the below output:
second step: assign default (optional) If you want to assign a default value then you need to use the below for loop (I couldn't figure out how to do this with map but the below works). This basically takes advantage of the special properties of the
colsclass object and assigns a default based on what you assign it. If you assign it a literal character value, the default will become character. If you assign it a literal numeric value, then the default will become numerical.This produces the below output:
Step3: With this variable "list" which stores each's file's
col()argument, we then need to make a new tibble that pairs thecol()arguments with the respective files. We can do this with a simple tibble.Step 4 Then use a the
map2_dffunction that allows you to pass simultaneously through two vectors (in this case thecol()arguments (.x) and file path(.y) to a common functionread_csvThat should save all your files to a new tibble.