How can I read the column specification col_types of the readr::read_delim function from a file?
Instead of
> read_csv(file = I('varInt,varChar,varFac\n
+ 1,a,A1\n
+ 2,b,A2\n
+ 3,c,A3'),
+ col_types = cols(varInt = 'i',
+ varChar = 'c',
+ varFac = col_factor(levels = c('A1', 'A2', 'A3'))))
# A tibble: 3 × 3
varInt varChar varFac
<int> <chr> <fct>
1 1 a A1
2 2 b A2
3 3 c A3
I want to do something like
mySpecFile <- read_csv(file = I("Variable,Spec\n
varInt,i\n
varChar,c\n
varFac,col_factor(levels = c('A1'; 'A2'; 'A3'))"))
mySpec <- mySpecFile |> pull(Spec, Variable) |> as.list()
read_csv(file = I('varInt,varChar,varFac\n
1,a,A1\n
2,b,A2\n
3,c,A3'),
col_types = mySpec)
But this throws: Error: Unknown shortcut: col_factor(levels = c('A1'; 'A2'; 'A3'))
So, specifying levels of factors does not work for me.
Seems to be related: R readr col_types specified in a metadata file, specifically using custom date formats
However, the readr::read_delim documentation says
One of NULL, a cols() specification, or a string. See vignette("readr") for more details.
If NULL, all column types will be inferred from guess_max rows of the input, interspersed throughout the file. This is convenient (and fast), but not robust. If the guessed types are wrong, you'll need to increase guess_max or supply the correct types yourself.
Column specifications created by list() or cols() must contain one column specification for each column. If you only want to read a subset of the columns, use cols_only().
Alternatively, you can use a compact string representation where each character represents one column:
A few things:
The
varFacspec is a string containingcol_factor, not a call or expression (or the results of it). We can possibly evaluate it.Your
varFac,col_factor(levels = c('A1'; 'A2'; 'A3'))doesn't have a valid R expression, we need to replace;with,; this likely means the spec CSV needs to be;-delimited (or something other than,)The
if (nchar(z) > 1)is to guard against"c"(forcharacter) becoming an R function (and possibly other things). If you want more specificity, change that conditional to something else.The
tryCatch(.., error = function(e) z)ensures that if it is not an expression, it returns the original string.As an alternative to using
;-delimited text, we can quote them (or just the one string) to protect the embedded commas we need.