Row-wise data from a Polars DataFrame indexable by column name

69 Views Asked by Nick K9 At 11 March 2024 at 16:08

I have two CSV files which I have read in and joined. Now I want to iterate through the rows in the resulting DataFrame and interact with various values by their column name. These are small files, and the final DataFrame has fewer than 200 rows in it. So while a performant solution would be nice, I'm much more interested in something that's easy to understand and concise.

I have found this answer which points to get_row_amortized. That gets me a Row, but I can't figure out how to get a value from the Row by column name. It's not get() or get_field(). There's also a warning about how comparitively slow this is, and a column-wise approach with iterators… which doesn't allow indexing by column name.

This other answer is the cleanest working solution I've found so far. It pulls columns out one at a time, casts them, and then groups the iterators into a single loop. I'm able to use izip!() to do all the columns this way and then destructure the results into a for loop:

use itertools::izip;

// In my real code this is generated from two joined DFs.
let df = df!(
    "year" => &[1920, 1920, 1920, 1930, 1930],
    "city" => &["Boston", "Memphis", "Columbus", "Boston", "Seattle"],
    "val"  => &[5, 42, 19, 52, 8])?;


let years = df.column("year")?.i64()?;
let cities = df.column("city")?.str()?;
let vals = df.column("val")?.i64()?;

for (year, city, val) in izip!(years, cities, vals)
{
    println!("{:?} {:?} {:?}", year, city, val);
}

But this is quite verbose and un-DRY, especially as the number of columns grows.

In reading the docs, I thought that using the Struct datatype would be what I wanted, but again: I can't figure out how to pull out a value by column name. I thought this would "deserialize" each row into a Struct that I make, but it doesn't seem to be working that way.

I'm hoping I'm just missing something and there's a way to do this extremely common operation concisely.

Any help is appreciated!

Update: This is available via iter_rows() in Polars for Python. And the Polars team has ruled out exposing this for Rust.

Original Q&A

Row-wise data from a Polars DataFrame indexable by column name

There are 0 best solutions below

Related Questions in RUST

Related Questions in RUST-POLARS

Trending Questions

Popular # Hahtags

Popular Questions