How to replace all double quotes in a polars DataFrame?

316 Views Asked by At

I have a csv-file, where some strings are enclosed in double quotes ("), others are not. I read the file into a polars DataFrame using python.

The table looks something like this:

┌────────────┬────────┬────────┬────────┬─────────┬───────┐
│ address    ┆ id     ┆ lat    ┆ lon    ┆ name    ┆ state │ 
│ ---        ┆ ---    ┆ ---    ┆ ---    ┆ ---     ┆ ---   │ 
│ str        ┆ str    ┆ f64    ┆ f64    ┆ str     ┆ str   │ 
╞════════════╪════════╪════════╪════════╪═════════╪═══════╡

I want to remove all double quotes. The problem only occurs in columns address and name.

For example some rows contain:

"Brückenstraße 1012, 89542 Herbrechtingen"

others look like this:

Brückenstraße 1012 89542 Herbrechtingen

and some others even have double quotes all over the place:

"Grundschule ""J. W. von Goethe"" "

I have tried the following and various variations: df = df.with_columns(pl.col(["address"]).str.replace_all('"', '')) but with no success. Some variations were putting an r before '"', escaping the double quotes and setting literal=True.

Edit: More reproducible data and code: Data:

address,fax,full_time_school,id,lat,lon,name,official_id,phone,school_type,school_type_entity,state
"Rombacher Straße 30, 73430 Aalen",07361/9561-20,false,BW-75774,48.838598,10.08184,Schubart-Gymnasium Partnerschule für Europa,75774,07361/9561-0,Gymnasium (G8),Gymnasium,BW
Steinweg 8 91567 Herrieden,09825 4962,false,BY-6727,49.237197,10.497239,Volksschule Herrieden,6727,09825 219,Grund- und Hauptschule,Grund- und Hauptschule,BY

Code using .ipynb 3.12:

import polars as pl
df = pl.read_csv("data.csv", has_header=True, columns=["address", "id", "lat", "lon", "name", "state"], encoding="utf8", separator=",",)
df = df.unique(subset=["id"])
df = df.drop_nulls(subset=["id", "lat", "lon"])
df = df.with_columns(pl.col(["address"]).str.replace_all('"', '',))
df = df.sort(by="id")
df.write_csv("data-clean.csv",)

Output in data-clean.csv:

address,id,lat,lon,name,state
"Rombacher Straße 30, 73430 Aalen",BW-75774,48.838598,10.08184,Schubart-Gymnasium Partnerschule für Europa,BW
Steinweg 8 91567 Herrieden,BY-6727,49.237197,10.497239,Volksschule Herrieden,BY
0

There are 0 best solutions below