Polars memory increases in Jupyter

33 Views Asked by At

I am processing a dataset of 15 mio. rows from a parquet file with polars (windows 11, Python 3.11.4, Polars 0.20.16).

If I run the script for the first time the task manager shows 5.7 GB of ram usage for this python process. Then, if I execute the following part of the script again:

MASTER = DF.join(G, how="left", on="uid").filter([
    (pl.col("num") > 1) & (pl.col("num")<100)
]).sort("num").with_columns([(pl.col("lvl_1") + pl.lit("_") + pl.col("lvl_2")).alias("cat")])

ram usage increases to 7.7 GB. Then 9.6 Gb and so on.

Yes, I just can reset the notebook and execute it again but I want to undestand why this is happening. As far as I can see, this part of the script always uses the same vars and does not create new ones. So the old data should be overwritten and ram usage not increase as the old data is replaced by exactly the same data. At least in theory.

Question: Any idea, why this is happening and what I can do to free the memory?

0

There are 0 best solutions below