polars.read_ndjson(data) nulling out entire columns?

95 Views Asked by At

This issue is driving me crazy. I am using the following code to read a .jsonlines file into python:

routes = pl.read_ndjson('~<Path>\az-routes.jsonlines')

Yesterday, it was nulling out a different column or columns each time (either the description, location, or protection column would be nulled out.) Now it has settled into nulling out the 'location' column every time.

Here is some sample data- as you can see the location data is present in the file:

{"route_name": "Bottom Shelf Lick Her", "grade": {"YDS": "V3-", "Font": "6A"}, "safety": "PG", "type": {"tr": true, "boulder": true}, "fa": "Joe Jenson", "description": ["Starts sitting to the right of the small roof and passes to the left under the roof, up the left side of the roof and back around the top of the roof to connect with either a high ball finish or a good jump down point. Crux is passing under the roof on the left and reaching the top of the roof, requires a tricky finger jam and a tiny crimp with limited footholds, mostly smears after you come out from under the roof."], "location": ["Continue driving down FS 136 past the waterfall for approximately half a mile until you see a very obvious and very large granite boulder on the right side of the road. Park at on the left side of the road below the boulder. Approach hike is around 100 feet."], "protection": ["Can be set up as a top rope from a solid tree if you want to top out the climb, the very top of the boulder is around 25 feet."], "metadata": {"left_right_seq": "999999", "parent_lnglat": [-111.90802, 34.53523], "parent_sector": "Copper Canyon", "mp_route_id": "111893803", "mp_sector_id": "111892198", "mp_path": "Central Arizona|Copper Canyon"}}

Only rows with data in the location column are changed to null, the original null values (coded as '') are left intact.

In addition, I have a brand new non-fatal error when running this code today:

Traceback (most recent call last): File "C:\Program Files\JetBrains\PyCharm 2023.3.1\plugins\python\helpers\pydev_pydev_bundle\pydev_console_utils.py", line 424, in execTableCommand success, res = exec_table_command(command, command_type, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\JetBrains\PyCharm 2023.3.1\plugins\python\helpers\pydev_pydevd_bundle\pydevd_tables.py", line 51, in exec_table_command res.append(table_provider.get_value_occurrences_count(table)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\JetBrains\PyCharm 2023.3.1\plugins\python\helpers\pydev_pydevd_bundle\tables\pydevd_polars.py", line 92, in get_value_occurrences_count bin_counts.append(analyze_column(col, table[col])) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\JetBrains\PyCharm 2023.3.1\plugins\python\helpers\pydev_pydevd_bundle\tables\pydevd_polars.py", line 110, in analyze_column column_visualisation_type, res = analyze_categorical_column(column, col_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\JetBrains\PyCharm 2023.3.1\plugins\python\helpers\pydev_pydevd_bundle\tables\pydevd_polars.py", line 133, in analyze_categorical_column value_counts = value_counts.sort("counts").reverse() ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\name\anaconda3\Lib\site-packages\polars\dataframe\frame.py", line 4635, in sort .collect(_eager=True) ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\name\anaconda3\Lib\site-packages\polars\lazyframe\frame.py", line 1730, in collect return wrap_df(ldf.collect()) ^^^^^^^^^^^^^ polars.exceptions.ColumnNotFoundError: counts

I have no idea where to start with this issue- I spent hours yesterday fiddling with the rest of my code, wondering why it was throwing different errors every time I ran it, only to find that it was running into errors with different columns depending on what polars had actually read into the dataframe. Please help!

1

There are 1 best solutions below

0
BallpointBen On

I think the issue is the use of empty strings to represent missing locations; polars doesn't like mixed dtypes when deserializing. You can tell it that you're expecting list[str] as follows:

pl.read_ndjson(filename, schema_overrides={'location': pl.List(pl.String)})
shape: (11_660, 9)
┌─────────────────┬─────────────────┬────────┬─────────────────┬───┬─────────────────┬─────────────────┬─────────────────┬─────────────────┐
│ route_name      ┆ grade           ┆ safety ┆ type            ┆ … ┆ description     ┆ location        ┆ protection      ┆ metadata        │
│ ---             ┆ ---             ┆ ---    ┆ ---             ┆   ┆ ---             ┆ ---             ┆ ---             ┆ ---             │
│ str             ┆ struct[8]       ┆ str    ┆ struct[5]       ┆   ┆ list[str]       ┆ list[str]       ┆ list[str]       ┆ struct[6]       │
╞═════════════════╪═════════════════╪════════╪═════════════════╪═══╪═════════════════╪═════════════════╪═════════════════╪═════════════════╡
│ Bottom Shelf    ┆ {"V3-",null,nul ┆ PG     ┆ {true,true,null ┆ … ┆ ["Starts        ┆ ["Continue      ┆ ["Can be set up ┆ {"999999",[-111 │
│ Lick Her        ┆ l,null,null,nul ┆        ┆ ,null,null}     ┆   ┆ sitting to the  ┆ driving down FS ┆ as a top rope   ┆ .90802,         │
│                 ┆ l,…             ┆        ┆                 ┆   ┆ right of…       ┆ 136 p…          ┆ fr…             ┆ 34.53523]…      │
│ Up and Away     ┆ {"5.7","5a","15 ┆        ┆ {true,null,null ┆ … ┆ ["Start on      ┆ null            ┆ ["Toperope. Set ┆ {"999999",[-111 │
│                 ┆ ","V+","13","MV ┆        ┆ ,null,null}     ┆   ┆ small ledge     ┆                 ┆ anchors at      ┆ .6482,          │
│                 ┆ S …             ┆        ┆                 ┆   ┆ above a b…      ┆                 ┆ Tope"…          ┆ 33.5471],"…     │
│ Wumba           ┆ {"V3-",null,nul ┆        ┆ {null,true,null ┆ … ┆ ["Begin with    ┆ ["This climb    ┆ null            ┆ {"1",[-111.8118 │
│                 ┆ l,null,null,nul ┆        ┆ ,null,null}     ┆   ┆ your hands      ┆ will be         ┆                 ┆ 7,              │
│                 ┆ l,…             ┆        ┆                 ┆   ┆ around t…       ┆ straight in…    ┆                 ┆ 33.87609],"Blu… │
│ tendrfoot       ┆ {"5.4","4a","12 ┆        ┆ {null,null,true ┆ … ┆ ["Left most     ┆ ["Climb from    ┆ ["Bolts"]       ┆ {"1",[-111.0021 │
│                 ┆ ","IV","10","VD ┆        ┆ ,null,null}     ┆   ┆ route"]         ┆ shelf to        ┆                 ┆ 4,              │
│                 ┆ 3…              ┆        ┆                 ┆   ┆                 ┆ shelf"]         ┆                 ┆ 34.40722],"Bea… │
│ Visions of      ┆ {"5.10","6b","2 ┆        ┆ {null,null,null ┆ … ┆ ["As Manny and  ┆ ["The route is  ┆ ["small to      ┆ {"8",[-110.9306 │
│ Yesteryears     ┆ 0","VII-","19", ┆        ┆ ,true,null}     ┆   ┆ I were setting  ┆ the obvious     ┆ large gear with ┆ 2,              │
│                 ┆ "E…             ┆        ┆                 ┆   ┆ up…             ┆ left …          ┆ bolte…          ┆ 33.81839],"Wor… │
│ …               ┆ …               ┆ …      ┆ …               ┆ … ┆ …               ┆ …               ┆ …               ┆ …               │
│ Easy Undercling ┆ {"V1-2",null,nu ┆        ┆ {null,true,null ┆ … ┆ ["Starts        ┆ ["Left of 'Easy ┆ null            ┆ {"0",[-111.4506 │
│                 ┆ ll,null,null,nu ┆        ┆ ,null,null}     ┆   ┆ matched on      ┆ Breezy' and     ┆                 ┆ 6,              │
│                 ┆ ll…             ┆        ┆                 ┆   ┆ undercling b…   ┆ 'Eas…           ┆                 ┆ 33.71177],"Eas… │
│ Easy Crack      ┆ {"V0-1",null,nu ┆        ┆ {null,true,null ┆ … ┆ ["A mellow      ┆ ["Obvious crack ┆ null            ┆ {"1",[-111.4506 │
│                 ┆ ll,null,null,nu ┆        ┆ ,null,null}     ┆   ┆ crack that      ┆ left of 'Easy   ┆                 ┆ 6,              │
│                 ┆ ll…             ┆        ┆                 ┆   ┆ veers up a…     ┆ Br…             ┆                 ┆ 33.71177],"Eas… │
│ Easy Breezy     ┆ {"V2",null,null ┆        ┆ {null,true,null ┆ … ┆ ["Start on good ┆ null            ┆ null            ┆ {"2",[-111.4506 │
│                 ┆ ,null,null,null ┆        ┆ ,null,null}     ┆   ┆ ledge hold, use ┆                 ┆                 ┆ 6,              │
│                 ┆ ,"…             ┆        ┆                 ┆   ┆ …               ┆                 ┆                 ┆ 33.71177],"Eas… │
│ Double Barrel   ┆ {"V3-4",null,nu ┆        ┆ {null,true,null ┆ … ┆ ["A small but   ┆ ["North facing  ┆ ["little pad"]  ┆ {"999999",[-111 │
│                 ┆ ll,null,null,nu ┆        ┆ ,null,null}     ┆   ┆ fun             ┆ compression     ┆                 ┆ .45159,         │
│                 ┆ ll…             ┆        ┆                 ┆   ┆ compression. G… ┆ betwe…          ┆                 ┆ 33.7124],…      │
│ Big Girtha      ┆ {"V3",null,null ┆        ┆ {null,true,null ┆ … ┆ ["A short and   ┆ ["Directly      ┆ ["little pad"]  ┆ {"999999",[-111 │
│                 ┆ ,null,null,null ┆        ┆ ,null,null}     ┆   ┆ burly one. Go   ┆ north of        ┆                 ┆ .45159,         │
│                 ┆ ,"…             ┆        ┆                 ┆   ┆ up b…           ┆ 'Double Barr…   ┆                 ┆ 33.7124],…      │
└─────────────────┴─────────────────┴────────┴─────────────────┴───┴─────────────────┴─────────────────┴─────────────────┴─────────────────┘

Note the correctly inferred null location in row 2 (Up and Away).