I am loading data into a pandas dataframe from an Excel sheet and there are a lot of non display characters in many columns that I want to convert.
The most prevalent is an apostrophe being used in a contraction ; e.g. doesn't which comes out as doesn’t.
In the past I have used :
str.encode('ascii', errors='ignore').decode('utf-8')
but this required me to know which columns I needed to fix.
In this case I have 103 columns which could each contain this or other types of issues like this.
I am looking for a way to just replace any and all issues across the entire dataframe.
Is there a quick and easy way to do this over the entire dataframe without having to pass in each column to a function ?
While reading the excel you should add
encoding='utf-8'or use
encoding='unicode-escape'