Considering
import polars as pl
df = pl.DataFrame({"a": [
[1, 2],
[3]],
"b": [
[{"id": 1, "x": 1}, {"id": 3, "x": 3}],
[{"id": 3, "x": 4}]]})
That looks like:
+------+---------------------+
|a |b |
+------+---------------------+
|[1, 2]|[{1,1}, {3,3}]|
|[3] |[{3,4}] |
+------+---------------------+
How to
- get one row for each flatten
aelement and - if the list of
dictinbcontains theaelement asid - then have the corresponding
xvalue in the columnb - otherwise
bshould benull
Current approach
.explode both a and b and .filter (INNER JOIN):
df.explode("a").explode("b").filter(
pl.col("a") == pl.col("b").struct.field('id')
).select(
pl.col("a"),
pl.col("b").struct.field("x")
)
Unfortunately I get only the (expected):
+-+----+
|a|b |
+-+----+
|1|1 |
|3|4 |
+-+----+
Instead of the full "LEFT JOIN" I am aiming to:
+-+----+
|a|b |
+-+----+
|1|1 |
|2|null|
|3|4 |
+-+----+
How to efficiently get the desired result when the DataFrame is structured like that?
As you have mentioned it, you can also use a left-join explicitly:
With regards to your current filtering logic:
You want to pick a single row if there are no matches per group e.g.
But it's quite awkward and you still have to null out the non-matches in an extra step.