How to create new pandas columns from a pandas data frame column content

27 Views Asked by At

I have the following pandas dataframe:

import pandas as pd
import numpy as np


ds1 = {'Name':["Juan Pablo Montoya","Jose Hiero","Martin Vasquez"], "Comments" : ["DOB 18 May 1967; POB Mexico.","POB Mexico.","-0-"]}

df1 = pd.DataFrame(data=ds1)

Which looks like this:

print(df1)
                 Name                      Comments
0  Juan Pablo Montoya  DOB 18 May 1967; POB Mexico.
1          Jose Hiero                   POB Mexico.
2      Martin Vasquez                           -0-

I need to create two new columns based on the contents of the Comments column. The names of the two new columns are:

  • DOB
  • POB

The values in column DOB is the value following DOB in the Comments column (up until the semi-colon). The values in column POB is the value following POB in the Comments column (up until the dot). If there is a value of "-0-" in the Comments then both new columns contain NaN.

So, from the example above, the resulting data frame would look like this: enter image description here

2

There are 2 best solutions below

0
Pep_8_Guardiola On BEST ANSWER

You can do this with str.extract():

df1['DOB'] = df1['Comments'].str.extract('DOB (.*);+')
df1['POB'] = df1['Comments'].str.extract('POB (.*)')

Output:

    Name                Comments                        DOB           POB
0   Juan Pablo Montoya  DOB 18 May 1967; POB Mexico.    18 May 1967   Mexico.
1   Jose Hiero          POB Mexico.                     NaN           Mexico.
2   Martin Vasquez      -0-                             NaN           NaN
2
Learning is a mess On

It cane be done with regex parsing:

df1['DOB'] = df1['Comments'].str.extract(r'DOB (\d{1,2} \w+ \d{4})')
df1['POB'] = df1['Comments'].str.extract(r'POB ([a-zA-Z\- .]+)')