I am reading a large .txt
file (>1GB) into R
via fread
. I am reading the file in directly from a .zip
archive, via a bash command:
base = fread('unzip -p Folder.zip File.txt', sep = '|', header = FALSE,
stringsAsFactors = FALSE, na.strings="", quote = "", col.names = col_namesMain)
The text file separates entries via |
so that a typical line might look like:
RRX|||02020||333293||||12123
However, there are many places where empty entries are denoted by separators with no space between them, e.g. ||
in the example line above.
When using fread
, these adjacent separators are typically read in altogether, so that the above line returns the following entries:
RRX, ||02020|, 333293|||, 12123
when it should read in as:
RRX, NA, NA, 02020, NA, 333293, NA, NA, NA, 12123
I have tried using read.table
with the option skipNul = TRUE
, and this works perfectly. However, there doesn't seem to be any option similar to skipNul
for fread
. I would much prefer to use fread
over read.table
if possible, since I have several very large files. Despite my searching, I haven't come across much discussion of this problem. Any help much appreciated.
This has been fixed in dev 1.12.3 on 15 Apr 2019 (see NEWS) :