Problem statement:
Initially what I had
I have a CSV file with the below records:-
data.csv:-
id,age,name 3500300026,23,"rahul" 3500300163,45,"sunita" 3500320786,12,"patrick" 3500321074,41,"Viper" 3500321107,54,"Dawn Breaker"
When I tried to run script.py on this with encoding 'ISO-8859-1', it's running fine
# script.py import pandas as pd test_data2=pd.read_csv('data.csv', sep=',', encoding='ISO-8859-1') print(test_data2)
Now what I have:-
But when I got a feed of the same file with
"at the front of every record, the parser behaved awkwardly. After the data change, new records looks like below:-
id,age,name "3500300026,23,"rahul" "3500300163,45,"sunita" "3500320786,12,"patrick" "3500321074,41,"Viper" "3500321107,54,"Dawn Breaker"
And after running the same script (script.py) for this new data file, I am getting the below result
Character " comes under ISO-8859-1 Character Set only so this can't be an issue anyway. It should be the parser, can't really get it why isn't the parser only focusing on , which I specifically passed as a separator to read_csv().
References: ISO-8859-1 Character set
I am curious to know the reason why pandas was not able to parse it properly or does it has any special connection with ".


You can tell pandas that you don't want double quotes to be treated specially by adding an argument to read_csv:
to
read_csv(). The output will be:parsing only on the comma.