How to split the nested lists in pandas

53 Views Asked by At

I am working on a dataset and encontered a problem. There is a column in which there are values in the form of nested listed. For example this is one of the value [Tomatillo-Red Chili Salsa (Hot), [Black Beans, Rice, Cheese, Sour Cream]]. Now I want to split this value such that 'Tomatillo-Red Chili Salsa (Hot)' comes into one newly created column and 'Black Beans, Rice, Cheese, Sour Cream' comes into another column.

I have tried this approach

choices = Dataset['choice_description'].str.extract(r'\[([^[\]]*)\]', expand=False)

Dataset[['choice1', 'choice2', 'choice3']] = choices.apply(lambda x: pd.Series(str(x).split(', ')))`.

'choice_description' is the column name that have nested lists type values. Choices1, choices2 and choices3 are the newly created columns.

When I run the above code, I got the unxpected output in which 'Tomatillo-Red Chili Salsa (Hot)' comes under the choice1 column which is correct but under choice2 column i only got 'Black beans' instead of whole 'Black Beans, Rice, Cheese, Sour Cream' and the rest ' Rice, Cheese, Sour Cream' come under the choice3 column. Why I am getting the output like this. I want 'Tomatillo-Red Chili Salsa (Hot)' in choice1 column and 'Black Beans, Rice, Cheese, Sour Cream' in choice2 column.

1

There are 1 best solutions below

0
gunnar On

Assuming that the strings representing nested lists always have the same structure, i.e. '[some item, [list, of, additional, items]', you could use

Dataset[['choice1', 'choice2']] = (
    Dataset['choice_description']
    .str.lstrip('[')
    .str.rstrip(']')
    .str.split(', [', expand=True, regex=False)
)

It's ugly and inflexible, but it will get the two parts separated.