Is there an R function / package to sort data on disk space (bigger than Ram datasets), similar to PROC SORT in Sas?

75 Views Asked by At

I find myself working with distributed datasets (parquet) taking up to >100gb on disk space. Together they sum up to approx 2.4B rows and 24 cols.

I manage to work on it with R/Arrow, simple operations are quite good, but when it comes to perform a sort by an ID sparse across different files Arrow requires to pull data first (collect()) and no amount of Ram seems to be enough.

From working experience I know that SAS Proc Sort is mostly performed on disk rather than on Ram, I was wondering if there's an R package with similar approach.

Any idea how to approach the problem in R, rather than buy a server with 256gb of Ram? Thanks, R

0

There are 0 best solutions below