Any idea what I can do to imitate the below pandas code using polars? Polars doesn't have indexes like pandas so I couldn't figure out what I can do .
df = pd.DataFrame(data = ([21,123], [132,412], [23, 43]), columns = ['c1', 'c2']).set_index("c1")
print(df.loc[[23, 132]])
and it prints
| c1 | c2 |
|---|---|
| 23 | 43 |
| 132 | 412 |
the only polars conversion I could figure out to do is
df = pl.DataFrame(data = ([21,123], [132,412], [23, 43]), schema = ['c1', 'c2'])
print(df.filter(pl.col("c1").is_in([23, 132])))
but it prints
| c1 | c2 |
|---|---|
| 132 | 412 |
| 23 | 43 |
which is okay but the rows are not in the order I gave. I gave [23, 132] and want the output rows to be in the same order, like how pandas' output has.
I can use a sort() later yes, but the original data I use this on has like 30Million rows so I'm looking for something that's as fast as possible.
I suggest using a
left jointo accomplish this. This will maintain the order corresponding to your list of index values. (And it is quite performant.)For example, let's start with this shuffled DataFrame.
And these index values:
We now perform a
left jointo obtain the rows corresponding to the index values. (Note that the list of index values is the left DataFrame in this join.)Notice how the rows are in the same order as the index values. We can verify this:
And the performance is quite good. On my 32-core system, this takes less than a second.