I'm trying to work Arrow IPC (feather) files which are larger than memory. PyArrow's documentation notes that "Arrow IPC files can be memory-mapped locally, which allow you to work with data bigger than memory". However, using the default memory map still overflows the memory. This makes sense, as the default is presumably attempting to still allocate the full memory size for the file. So my next thought would be to specify a memory map size smaller than the file similar to what can be done with Python's built-in mmap length. Unfortunately, I can't seem to find any way to specify a smaller memory map for the PyArrow memory map. Something like pyarrow.feather.read_feather does not have a way to explicitly set the memory map size. It does let you use a lower level MemoryMappedFile object though. But even within the lower level MemoryMappedFile documentation, I can't seem to find a way to specify the size of the memory map to be smaller than the file being loaded. How do I work with an Arrow IPC file that's larger than memory?
PyArrow - Working with larger than memory Arrow IPC (feather) files?
426 Views Asked by Matt Robin At
0
There are 0 best solutions below
Related Questions in PYARROW
- Pyarrow: ImportError: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.28' not found
- Already pip3 installed latest version of pyarrow(15.0.2) and polars(0.20.16) but still got an error
- PyArrow dataset S3 performance different with PyArrow filesystem, s3fs, indirect copy
- Pyarrow Dataset: : Does predicate pushdown is applied when filter is applied non-partition colulmns
- Using pyarrow.DictionaryArray instead of Categorical in pandas DataFrame
- pandas.to_parquet pyarrow.lib.ArrowInvalid: Could not convert Timedelta
- Polars PanicException when reading a parquet file
- Pandas read_csv works but pyarrow doesnt
- How to transform pyarrow Table in order to use it with pyarrow.compute methods
- how to handle read errors in pyarrow read_csv
- Pyarrow Schema definition
- how to read parquet metadata for pyarrow Dataset
- Get orignal schema from Parquet files
- Correct way to specify JSON block size for PyArrow dataset?
- Pandas with pyarrow does not use additional memory when splitting dataframe
Related Questions in APACHE-ARROW
- How do I locally host an Apache Arrow Flight server using Go and retrieve in Javascript?
- Alternatives for distinct(.keep_all = TRUE) in arrow?
- R arrow query extremely slow first time, fast thereafter?
- Is there any way to stream to a parquet file in Ruby?
- parquet StreamReader giving blank values for few columns, and correct for another?
- How can I order an arrow2 Chunk by a given column in rust?
- How can I read a reqwest::Response object's bytes_stream() with an implementer of arrow_array::RecordBatchReader?
- how to create a dataframe in Rust so it can be used in DataFusion?
- how to create a polars-arrow `Array` from raw values (`&[u8]`)
- How to group arrow table by column value in C++?
- arrow::open_dataset, hive partitioning, and number-like strings
- One-hot-encoding while loading data with arrow-rs
- SQL query on arrow duckdb workflow in R
- Arrow RecordBatch as Polars DataFrame
- apache arrow - array of variant type
Related Questions in FEATHER
- Process killed when reading big feather file with pandas due to OOM
- Can you use read_feather without pyarrow package?
- Experiencing issues integrating Feather Icons through a CDN in Angular
- Reading feather file into GeoPandas dataframe from URL
- PyArrow - Working with larger than memory Arrow IPC (feather) files?
- Feather read failure from pandas with pd.cut data
- identical(X1, X2) is TRUE, but digest::sha1(X1) != digest::sha1(X2)
- How can I convert a feather file to CSV using Python?
- is there a command line tool to read number of rows and columns of a feather file?
- Is there a way to write read/write chunks in the feather format on Python?
- Pint support for saving to Parquet, hdf5 and Feather files
- Most efficient way to save / load huge DataFrames?
- Reading/writing apache feather files from a compiled python script fails
- python polars - kernel keep crashing while concatenating thousands of csv/feather files
- Feather only side of a div and blur on only left and right side
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?