How to drop and keep only certain non alphanumeric characters?

159 Views Asked by At

I Have df that looks like this:

email                                    id
{'email': ['[email protected]']}           {'id': ['123abc_d456_789_fgh']}

when I drop non alphanumeric characters like so:

df.email = df.email.str.replace('[^a-zA-Z]', '')
df.email = df.email.str.replace('email', '')


df.id = df.id.str.replace('[^a-zA-Z]', '')
df.id = df.id.str.replace('id', '')

The columns look like this:

email                    id
testtestcom              123abcd456789fgh

How do I tell the code to not drop anything in the square brackets but drop all non alpha numeric characters outside the brackets?

New df should like this:

email                        id
[email protected]                123abc_d456_789_fgh
2

There are 2 best solutions below

0
Gianmar On BEST ANSWER

This is hardcoded, but works:

df.email = df.email.str.replace(".+\['|'].+", '')
df.id = df.id.str.replace(".+\['|'].+", '')

>>> '[email protected]'
>>> '123abc_d456_789_fgh'
12
The fourth bird On

According to the comments, what you might do is capture what is in between the square brackets in a capturing group.

In the replacement use the first capturing group.

\{'[^']+':\s*\['([^][]+)'\]}

That will match

  • \{ Match {
  • '[^']+' Match ', then not ' 1+ times
  • : Match literally
  • \s*\[' Match 0+ times a whitespace character and then [
  • ([^][]+) Capture group, match not [ or ]
  • '\] Match ]
  • } Match literally

Regex demo | Python demo