pandas explode: how can I restore it to its original form after a merge?

173 Views Asked by At

I use pandas explode to make my work and i need to restore it to its original form after a merge. how can make it?

edit my first post It's my fault that I didn't give you the full details and so in my real problem your solution doesn't work... It doesn't work for the following reasons:I have 3 burst columns and I don't have one column with unique values but two with a unique combination Could you help me with this please? I edited my original post

import pandas as pd

data = {'color': ['black green', 'red', 'white orange', 'jack', 'blue  green'],
        'Name': ['Tom', 'red tan', 'krish', 'jack bac', 'bob'],
        'Name_2': ['Karl', 'johans', 'bill jack', 'frank', 'teo'],
        'category': ['a', 'b', 'a', 'b', 'b'],
        'level_1': [2, 2, 3, -4, -2]}
        
data_2 = {'color': ['green', 'red','orange'],
          'Name': ['Tom', 'tan','krish'],
          'Name_2': ['Karl', 'johans','bill'],
          'category': ['a', 'b', 'a'],
          'level_1': [2, 2,3]}
df = pd.DataFrame(data)
df_2 = pd.DataFrame(data_2)
print(df)
print(df_2)

df['color'] = df['color'].str.strip().str.split(r'\s+', regex=True)
df = df.explode('color').reset_index(drop=True)
print(df)

df['Name'] = df['Name'].str.strip().str.split(r'\s+', regex=True)
df = df.explode('Name').reset_index(drop=True)
print(df)

df['Name_2'] = df['Name_2'].str.strip().str.split(r'\s+', regex=True)
df = df.explode('Name_2').reset_index(drop=True)
print(df)

merge = pd.merge(df, df_2, on=['color', 'Name','Name_2','level_1'], how='outer', indicator=True)
print(merge)

i need solution for both is:

data_3 = {'color': ['black green', 'red',' white orange'],
          'Name': ['Tom', 'red tan','krish'],
          'Name_2': ['Karl', 'johans','bill jack'],
          'category': ['a', 'b','a'],
          'level_1': [2, 2,3],
         }
df_solution = pd.DataFrame(data_3)
print(df_solution)
1

There are 1 best solutions below

2
Timeless On BEST ANSWER

IIUC, you can use :

tmp = df.copy()
cols = ["color", "Name", "Name_2"]
tmp[cols] = tmp[cols].apply(lambda x: x.str.split("\s+"))

def xpl(df, col):
    return df.explode(col)

matches = (
    xpl(tmp, "color")
        .pipe(lambda x: xpl(x, "Name"))
        .pipe(lambda x: xpl(x, "Name_2"))
        .reset_index().merge(df_2)["index"]
)

# matches gives [0, 1, 2]
df_solution = df.loc[matches]

Output :

print(df_solution)

          color     Name     Name_2 category  level_1
0   black green      Tom       Karl        a        2
1           red  red tan     johans        b        2
2  white orange    krish  bill jack        a        3

Old answer :

I don't understand the context but you can still try this :

def implode(df, col, sep=" "):
    col_agg = df.columns.difference([col, "level_1"])[0]
    return (
        df.groupby(col, as_index=False, sort=False)
        .agg({col_agg: lambda x: " ".join(x), "level_1": "first"})
    )
    
df_solution = implode(df, "Name").pipe(lambda x: implode(x, "color"))

Output :

print(df_solution)

          color      Name  level_1
0   black green       Tom        2
1           red  nick tan       -3
2  white orange     krish        7
3          jack  jack bac       -4
4    blue green       bob       -2