PyArrow - Working with larger than memory Arrow IPC (feather) files?

426 Views Asked by At

I'm trying to work Arrow IPC (feather) files which are larger than memory. PyArrow's documentation notes that "Arrow IPC files can be memory-mapped locally, which allow you to work with data bigger than memory". However, using the default memory map still overflows the memory. This makes sense, as the default is presumably attempting to still allocate the full memory size for the file. So my next thought would be to specify a memory map size smaller than the file similar to what can be done with Python's built-in mmap length. Unfortunately, I can't seem to find any way to specify a smaller memory map for the PyArrow memory map. Something like pyarrow.feather.read_feather does not have a way to explicitly set the memory map size. It does let you use a lower level MemoryMappedFile object though. But even within the lower level MemoryMappedFile documentation, I can't seem to find a way to specify the size of the memory map to be smaller than the file being loaded. How do I work with an Arrow IPC file that's larger than memory?

0

There are 0 best solutions below