How can I split a column in pandas

55 Views Asked by At

I have a data frame with a column that contains string and digit,

Prod_nbr| prod_name
5   Natural chip companyseasalt175g
66  cC Nacho cheese 172g
61  Smiths Crinkle cut chips chicken135g

My desired output is

Prod_nbr|pack|prod_name
5          175g  Natural chip....
66         172g  cC Nacho cheese..
61         135g   Smiths Crinkle...

I tried the code below but I didn't get my desired output, I got

Output

df['pack'] = df['prod_name'].str.extract(r'\d+\s*(\w{,5})\b').fillna('')[0]
2

There are 2 best solutions below

0
alec_djinn On BEST ANSWER

I would make a custom function to solve the parsing of the field, then apply it by row to the whole DataFrame. I prefer this way because most of the time you will find some unexpected string in the data, and using a function helps you with tweaking the output when needed.

Here is a quick example.

def parse(row):
    s = row.prod_name
    matches = re.findall('\d+g', s)
    if matches:
        if len(matches) == 1:
            return matches[0] #if you have a single match
        else:
            return 'parsing error' #if you have multiple unexpected matches
    return np.nan #no matches


df['pack'] = df.apply(parse, axis=1)
1
Pep_8_Guardiola On

Your regex isn't matching as expected, try this instead:

df['pack'] = df['prod_name'].str.extract(r"(\d+g)$")

And if you care about the position, as per your expected outcome:

df.insert(1, 'pack', df['prod_name'].str.extract(r"(\d+g$)"))