Polars memory increases in Jupyter

33 Views Asked by Thomas Dorloff At 26 March 2024 at 10:58

I am processing a dataset of 15 mio. rows from a parquet file with polars (windows 11, Python 3.11.4, Polars 0.20.16).

If I run the script for the first time the task manager shows 5.7 GB of ram usage for this python process. Then, if I execute the following part of the script again:

MASTER = DF.join(G, how="left", on="uid").filter([
    (pl.col("num") > 1) & (pl.col("num")<100)
]).sort("num").with_columns([(pl.col("lvl_1") + pl.lit("_") + pl.col("lvl_2")).alias("cat")])

ram usage increases to 7.7 GB. Then 9.6 Gb and so on.

Yes, I just can reset the notebook and execute it again but I want to undestand why this is happening. As far as I can see, this part of the script always uses the same vars and does not create new ones. So the old data should be overwritten and ram usage not increase as the old data is replaced by exactly the same data. At least in theory.

Question: Any idea, why this is happening and what I can do to free the memory?

Original Q&A

Polars memory increases in Jupyter

There are 0 best solutions below

Related Questions in MEMORY

Related Questions in JUPYTER-NOTEBOOK

Related Questions in PYTHON-POLARS

Trending Questions

Popular # Hahtags

Popular Questions