How can I solve the memory problem that appears when you read the dataset?

37 Views Asked by At

Regardless what you set low_memory , true or false I get the memory error. Unable to allocate 13.5 GiB for an array with shape (4357, 415796) and data type float64

1

There are 1 best solutions below

0
BeRT2me On

low_memory=True only helps reduce memory usage while parsing, it won't help at all with the total size of the file.

To process a file this large you'll need to work on it in chunks.

If some of your calculations need the whole file at once, you'll need to look into other options such as pyspark or dask.

# IIUC, this should approx. be chunks of ~1.2GB:
with pd.read_csv('file.csv', chunksize=400) as reader:
    for chunk in reader:
        # Do stuff with each chunk.