I have a dataset with a working solution that uses a for loop with Polars. However, I would like to use a built in Polars function like map_elements to apply a function that creates a struct of variable length based on an id column.
Working Solution:
import polars as pl
pl.Config.set_fmt_str_lengths(1000)
import warnings
warnings.filterwarnings("ignore")
df = pl.DataFrame({
"id": [1, 1, 2, 3],
"dog": ["labrador", "labrador", "boxer", "airedale"],
"age": [2, 2, 3, 4],
"owner": ["Jim", "Paul", "Kim", "Lynne"],
"spots": [None, None, True, None],
"food": ["Raw", "Kibble", None, "Raw"],
"leash": [True, False, None, True]
})
id_struct_dict = {
1: ["owner", "food"],
2: ["owner", "spots", "food"],
3: ["owner", "food", "leash"]
}
run = 0
for id, lst in id_struct_dict.items():
print(id)
struct_expr = pl.struct(lst).map_elements(lambda x: str(x)).alias("struct")
if run == 0:
df = df.with_columns([pl.when(pl.col("id") == id).then(struct_expr).otherwise(None)])
run += 1
else:
df = df.with_columns([pl.when(pl.col("id") == id).then(struct_expr).otherwise(pl.col('struct'))])
df
I really dislike my solution, but it is the only way I was able to achieve my desired result. I tried mapping a function/udf via a different method, but the final struct case would apply to all rows despite the difference in ID.
Any ideas/help would be appreciated. Thank you.
You can pass multiple when/then expressions to
pl.coalescein order to produce a single column result.It is possible to "stringify" structs with
.struct.json_encode()if JSON is acceptable.