I am trying to load a .csv file with about 4 million columns, and a few hundred rows, using R data.table fread(). I've set it with verbose=TRUE, and here is the error I get:
OpenMP version (_OPENMP) 201511
omp_get_num_procs() 20
R_DATATABLE_NUM_PROCS_PERCENT unset (default 50)
R_DATATABLE_NUM_THREADS unset
R_DATATABLE_THROTTLE unset (default 1024)
omp_get_thread_limit() 2147483647
omp_get_max_threads() 20
OMP_THREAD_LIMIT unset
OMP_NUM_THREADS unset
RestoreAfterFork true
data.table is using 10 threads with throttle==1024. See ?setDTthreads.
Input contains no \n. Taking this to be a filename to open
[01] Check arguments
Using 10 threads (omp_get_max_threads()=20, nth=10)
NAstrings = [<<NA>>]
None of the NAstrings look like numbers.
show progress = 0
0/1 column will be read as integer
[02] Opening the file
Opening file /q/combined.u.NA.ntwistbd.csv
File opened, size = 7.265GB (7800965634 bytes).
Memory mapped ok
[03] Detect and skip BOM
[04] Arrange mmap to be \0 terminated
\n has been found in the input and different lines can end with different line endings (e.g. mixed \n and \r\n in one file). This is common and ideal.
[05] Skipping initial rows if needed
Positioned on line 1 starting: <<gene,chr1.10469.10470.cpg_inte>>
[06] Detect separator, quoting rule, and ncolumns
Detecting sep automatically ...
sep=',' with 100 lines of 3986159 fields using quote rule 0
Detected 3986159 columns on line 1. This line is either column names or first data row. Line starts as: <<gene,chr1.10469.10470.cpg_inte>>
Quote rule picked = 0
fill=false and the most number of columns found is 3986159
[07] Detect column types, good nrow estimate and whether first row is column names
'header' changed by user from 'auto' to true
Number of sampling jump points = 1 because (7800965633 bytes from row 1 to eof) / (2 * 3658680597 jump0size) == 1
Type codes (jump 000) : C7777777777777777777777777777777777775577755777777777777775772777755777777777777...2222222222 Quote rule 0
Type codes (jump 001) : C7777777777777777777777777777777777775577777777777777777777772777755777777777777...2222222222 Quote rule 0
=====
Sampled 153 rows (handled \n inside quoted fields) at 2 jump points
Bytes from first data row on line 2 to the end of last row: 7456598287
Line length: mean=33644907.61 sd=-nan min=18359868 max=54585593
Estimated number of rows: 7456598287 / 33644907.61 = 222
Initial alloc = 406 rows (222 + 82%) using bytes/max(mean-2*sd,min) clamped between [1.1*estn, 2.0*estn]
=====
[08] Assign column names
[09] Apply user overrides on column types
After 0 type and 0 drop user overrides : C7777777777777777777777777777777777775577777777777777777777772777755777777777777...2222222222
[10] Allocate memory for the datatable
Allocating 3986159 column slots (3986159 - 0 dropped) with 406 rows
[11] Read the data
jumps=[0..1), chunk_size=33644907614, total_size=7456598287
2390 out-of-sample type bumps: C7777777777777777777777777777777777775577777777777777777777772777755777777777777...2222222222
*** caught segfault ***
address (nil), cause 'unknown'
Segmentation fault (core dumped)
Is this a memory issue? I am running it on a Linux machine with 188Gb RAM, and the file is about 7-8Gb in size. Any ideas?