I have a csv-file, where some strings are enclosed in double quotes ("), others are not. I read the file into a polars DataFrame using python.
The table looks something like this:
┌────────────┬────────┬────────┬────────┬─────────┬───────┐
│ address ┆ id ┆ lat ┆ lon ┆ name ┆ state │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ f64 ┆ f64 ┆ str ┆ str │
╞════════════╪════════╪════════╪════════╪═════════╪═══════╡
I want to remove all double quotes. The problem only occurs in columns address and name.
For example some rows contain:
"Brückenstraße 1012, 89542 Herbrechtingen"
others look like this:
Brückenstraße 1012 89542 Herbrechtingen
and some others even have double quotes all over the place:
"Grundschule ""J. W. von Goethe"" "
I have tried the following and various variations:
df = df.with_columns(pl.col(["address"]).str.replace_all('"', ''))
but with no success.
Some variations were putting an r before '"', escaping the double quotes and setting literal=True.
Edit: More reproducible data and code: Data:
address,fax,full_time_school,id,lat,lon,name,official_id,phone,school_type,school_type_entity,state
"Rombacher Straße 30, 73430 Aalen",07361/9561-20,false,BW-75774,48.838598,10.08184,Schubart-Gymnasium Partnerschule für Europa,75774,07361/9561-0,Gymnasium (G8),Gymnasium,BW
Steinweg 8 91567 Herrieden,09825 4962,false,BY-6727,49.237197,10.497239,Volksschule Herrieden,6727,09825 219,Grund- und Hauptschule,Grund- und Hauptschule,BY
Code using .ipynb 3.12:
import polars as pl
df = pl.read_csv("data.csv", has_header=True, columns=["address", "id", "lat", "lon", "name", "state"], encoding="utf8", separator=",",)
df = df.unique(subset=["id"])
df = df.drop_nulls(subset=["id", "lat", "lon"])
df = df.with_columns(pl.col(["address"]).str.replace_all('"', '',))
df = df.sort(by="id")
df.write_csv("data-clean.csv",)
Output in data-clean.csv:
address,id,lat,lon,name,state
"Rombacher Straße 30, 73430 Aalen",BW-75774,48.838598,10.08184,Schubart-Gymnasium Partnerschule für Europa,BW
Steinweg 8 91567 Herrieden,BY-6727,49.237197,10.497239,Volksschule Herrieden,BY