Custom Transformation Per Row Using Polars

65 Views Asked by At

I have a dataset with a working solution that uses a for loop with Polars. However, I would like to use a built in Polars function like map_elements to apply a function that creates a struct of variable length based on an id column.

Working Solution:


import polars as pl
pl.Config.set_fmt_str_lengths(1000)

import warnings
warnings.filterwarnings("ignore")

df = pl.DataFrame({
    "id": [1, 1, 2, 3],
    "dog": ["labrador", "labrador", "boxer", "airedale"],
    "age": [2, 2, 3, 4],
    "owner": ["Jim", "Paul", "Kim", "Lynne"],
    "spots": [None, None, True, None],
    "food": ["Raw", "Kibble", None, "Raw"],
    "leash": [True, False, None, True]
})

id_struct_dict = {
    1: ["owner", "food"],
    2: ["owner", "spots", "food"],
    3: ["owner", "food", "leash"]
}

run = 0
for id, lst in id_struct_dict.items():
    print(id)
    struct_expr = pl.struct(lst).map_elements(lambda x: str(x)).alias("struct")
    if run == 0:
        df = df.with_columns([pl.when(pl.col("id") == id).then(struct_expr).otherwise(None)])
        run += 1
    else:
        df = df.with_columns([pl.when(pl.col("id") == id).then(struct_expr).otherwise(pl.col('struct'))])

df

I really dislike my solution, but it is the only way I was able to achieve my desired result. I tried mapping a function/udf via a different method, but the final struct case would apply to all rows despite the difference in ID.

Any ideas/help would be appreciated. Thank you.

1

There are 1 best solutions below

2
jqurious On

You can pass multiple when/then expressions to pl.coalesce in order to produce a single column result.

It is possible to "stringify" structs with .struct.json_encode() if JSON is acceptable.

df.with_columns(
   pl.coalesce(
      pl.when(pl.col("id") == _id)
        .then(pl.struct(cols).struct.json_encode())
      for _id, cols in id_struct_dict.items()
   )
   .alias("struct")
)
shape: (4, 8)
┌─────┬──────────┬─────┬───────┬───────┬────────┬───────┬─────────────────────────────────────────────┐
│ id  ┆ dog      ┆ age ┆ owner ┆ spots ┆ food   ┆ leash ┆ struct                                      │
│ --- ┆ ---      ┆ --- ┆ ---   ┆ ---   ┆ ---    ┆ ---   ┆ ---                                         │
│ i64 ┆ str      ┆ i64 ┆ str   ┆ bool  ┆ str    ┆ bool  ┆ str                                         │
╞═════╪══════════╪═════╪═══════╪═══════╪════════╪═══════╪═════════════════════════════════════════════╡
│ 1   ┆ labrador ┆ 2   ┆ Jim   ┆ null  ┆ Raw    ┆ true  ┆ {"owner":"Jim","food":"Raw"}                │
│ 1   ┆ labrador ┆ 2   ┆ Paul  ┆ null  ┆ Kibble ┆ false ┆ {"owner":"Paul","food":"Kibble"}            │
│ 2   ┆ boxer    ┆ 3   ┆ Kim   ┆ true  ┆ null   ┆ null  ┆ {"owner":"Kim","spots":true,"food":null}    │
│ 3   ┆ airedale ┆ 4   ┆ Lynne ┆ null  ┆ Raw    ┆ true  ┆ {"owner":"Lynne","food":"Raw","leash":true} │
└─────┴──────────┴─────┴───────┴───────┴────────┴───────┴─────────────────────────────────────────────┘