I'm new to Polars, and I ended up writing this code to compute some aggregating expression over segments of n rows:
import polars as pl
df = pl.DataFrame({"a": [1, 1, 3, 8, 62, 535, 4213]})
(
df.with_columns(index=pl.int_range(pl.len(), dtype=pl.Int32))
.group_by_dynamic(index_column="index", every="3i")
.agg(pl.col("a").mean())
)
For the example I set n==3 for 7 rows, but think of a smallish n of about 100, for a multicolumn data frame of about 10**6 rows.
I was wondering if this is the idiomatic way of doing this type of operation.
Somehow group_by_dynamic over an Int32 range seems overkill to me: I was wondering if there is a more direct way of doing the same aggregation.
IMO your solution using
group_by_dynamicalready follows best practices when it comes to the aggregation.However, you can simplify the creation of the index column quite a bit using
pl.DataFrame.with_row_index. As the result is unsigned (andgroup_by_dyanmiconly allows for a signed integer index column), you'll need to pass an expression doing the casting, i.e.