df.drop_duplicates() in polars?

120 Views Asked by Talha Tayyab At 22 February 2024 at 18:26

import polars as pl

df = pl.DataFrame(
    {
        "X": [4, 2, 3, 4],
        "Y": ["p", "p", "p", "p"],
        "Z": ["b", "b", "b", "b"],
    }
)

We know the equivalent of pandas's df.drop_duplicates() is df.unique() in python-polars

But, each time I execute my query I get a different result?

print(df.unique())

X   Y   Z
i64 str str
3   "p" "b"
2   "p" "b"
4   "p" "b"

X   Y   Z
i64 str str
4   "p" "b"
2   "p" "b"
3   "p" "b"

X   Y   Z
i64 str str
2   "p" "b"
3   "p" "b"
4   "p" "b"

Is this intentional and what is the reason behind it?

Original Q&A

There are 1 best solutions below

Talha Tayyab On 22 February 2024 at 18:26 BEST ANSWER

Yes, this is an intentional behavior.

If you need a consistent behavior then do:

df.unique(maintain_order=True)

polars.DataFrame.unique

maintain_order

Keep the same order as the original DataFrame. This is more expensive to compute. Settings this to True blocks the possibility to run on the streaming engine.

Maintaining order is not streaming-friendly as it requires bringing together all the chunks in memory to compare the order of the rows.

With this change of default the developers want to ensure that Polars is ready to work with datasets of all sizes while allowing users to choose different behaviour if desired.

A related point is the choice of which row within each duplicated group is kept by unique. In Pandas this defaults to the first row of each duplicated groups. In Polars the default is any as this again allows more optimizations.

Other functions that have this behavior include:

1.group_by (maintain_order: bool = False)

2.partition_by (maintain_order: bool = True)

3.pivot (maintain_order: bool = True)

4.upsample (maintain_order: bool = False)

A detailed article here by @LiamBrannigan: https://www.rhosignal.com/posts/polars-ordering/

df.drop_duplicates() in polars?

There are 1 best solutions below

Yes, this is an intentional behavior.

Related Questions in PYTHON

Related Questions in PYTHON-3.X

Related Questions in DATAFRAME

Related Questions in UNIQUE

Related Questions in PYTHON-POLARS

Trending Questions

Popular # Hahtags

Popular Questions