I am trying to bin values to prepare data to be later fed into a plotting library.
For this I am trying to use polars Expr.cut. The dataframe I operate on contains different groups of values, each of these groups should be binned using different breaks. Ideally I would like to use np.linspace(BinMin, BinMax, 50) for the breaks argument of Expr.cut.
I managed to make the BinMin and BinMax columns in the dataframe. But I can't manage to use np.linspace to define the breaks dynamically for each row of the dataframe.
This is a minimal example of what I tried:
import numpy as np
import polars as pl
df = pl.DataFrame({"Value": [12], "BinMin": [0], "BinMax": [100]})
At this point the dataframe looks like:
┌───────┬────────┬────────┐
│ Value ┆ BinMin ┆ BinMax │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═══════╪════════╪════════╡
│ 12 ┆ 0 ┆ 100 │
└───────┴────────┴────────┘
And trying to use Expr.cut with dynamic breaks:
df.with_columns(pl.col("Value").cut(breaks=np.linspace(pl.col("BinMin"), pl.col("BinMax"), 50)).alias("Bin"))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[10], line 1
----> 1 df.with_columns(pl.col("Value").cut(breaks=range(pl.col("BinMin"), pl.col("BinMax"))).alias("Bin"))
TypeError: 'Expr' object cannot be interpreted as an integer
I understand the error, that np.linspace is expecting to be called with actual scalar integers, not polars Expr. But I cannot figure out how to call it with dynamic breaks derived from the BinMin and BinMax columns.
Unfortunately,
pl.Expr.cutdoesn't support expressions for thebreaksargument (yet), but requires a fixed sequence.(This would be a good feature request though).
A naive solution that will work for DataFrames, but doesn't use polars' native expression API, would be to use
pl.Expr.map_elementstogether with the corresponding functionality in numpy.