How would I change/remove 'non-printable' characters e.g Â from df.columns values incorporating the regex statements already in place

138 Views Asked by Peter R At 14 February 2023 at 13:04

Have tried the above with no success. Note ..This is specific to the text Column Headings and not the Column Values

df.columns = [x.lower().replace(" ","").replace("?","").replace("_","").replace( "Â" , "") for x in df.columns]

Would have replaced the non-printable character but has failed.

Can anyone help ?

Original Q&A

There are 1 best solutions below

Pawel Kam On 14 February 2023 at 18:11

First of all, please remember that replace is case sensitive. Also, when chaining functions, the order is important.

"Â".lower().replace("Â", "") # "â"
"Â".replace("Â", "").lower() # ""

If the reason for the matter in question is a Mojibake encoding/decoding issue, you can try this quick fix with ftfy library. You can use it in conjunction with the rename function.

import ftfy

def _change_column_name(val):
    # fix mojibake
    val = ftfy.fix_text(val)
    # whatever data processing you need
    return val.replace("Â", "").lower()

df.rename(columns=_change_column_name, inplace=True)

@tripleee is right, though. Maybe instead of quick fix you'd want to fix encoding/decoding errors in your source data.

How would I change/remove 'non-printable' characters e.g Â from df.columns values incorporating the regex statements already in place

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in NON-PRINTING-CHARACTERS

Trending Questions

Popular # Hahtags

Popular Questions