Python split a column value into multiple columns and keep remaining column same

71 Views Asked by At

I have my data as below. In that, i want the values of column A to be split into different columns and have values of corresponding column2 values and need column3 to contain the corresponding value of the group.Please note that this is just a sample and there are multiple such groups with their corresponding value

Input

Column1 Column2 Column3
Group1  Value1  V1
Group1  Value2  V2
Group1  Value3  V3
Group1  Value4  V4
Group2  Value1  x1
Group2  Value2  x2
Group2  Value3  x3
Group2  Value4  x4
Group3  Value1  y1
Group3  Value2  y2

Expected Output:

Group1 Group2 Group3 Column3.Group1 Column3.Group2 Column3.Group3
Value1 Value1 Value1  v1             x1                y1
Value2 Value2 Value2  v2             x2                y1
Value3 Value3 NaN     v3             x3                NaN
Value4 Value4 NaN     v4             x4                NaN

Anyway to achive this in python? Searched the forum, but i couldnt get..New to python, so an explanation and solution would help. TIA

3

There are 3 best solutions below

1
Panda Kim On BEST ANSWER
tmp = df.assign(cc=df.groupby('Column1').cumcount())
out = pd.concat(
    [tmp.pivot(index='cc', columns='Column1', values='Column2'), 
     tmp.pivot(index='cc', columns='Column1', values='Column3').add_prefix('Column3.')
    ], axis=1).rename_axis(index=None, columns=None)

out:

   Group1  Group2  Group3 Column3.Group1 Column3.Group2 Column3.Group3
0  Value1  Value1  Value1             V1             x1             y1
1  Value2  Value2  Value2             V2             x2             y2
2  Value3  Value3     NaN             V3             x3            NaN
3  Value4  Value4     NaN             V4             x4            NaN
0
jezrael On

Use GroupBy.cumcount with DataFrame.pivot and last flatten MultiIndex in list comprehension:

out = (df.assign(g=df.groupby('Column1').cumcount())
         .pivot(index='g', columns='Column1')
         .rename_axis(None))
out.columns = [b if a == 'Column2' else f'{a}.{b}' for a, b in out.columns]
print (out)
   Group1  Group2  Group3 Column3.Group1 Column3.Group2 Column3.Group3
0  Value1  Value1  Value1             V1             x1             y1
1  Value2  Value2  Value2             V2             x2             y2
2  Value3  Value3     NaN             V3             x3            NaN
3  Value4  Value4     NaN             V4             x4            NaN

If original DataFrame has more columns and need processing only Column1, Column2, Column3 add parameter value:

out = (df.assign(g=df.groupby('Column1').cumcount())
         .pivot(index='g', columns='Column1', values=['Column2','Column3'])
         .rename_axis(None))
out.columns = [b if a == 'Column2' else f'{a}.{b}' for a, b in out.columns]
print (out)
   Group1  Group2  Group3 Column3.Group1 Column3.Group2 Column3.Group3
0  Value1  Value1  Value1             V1             x1             y1
1  Value2  Value2  Value2             V2             x2             y2
2  Value3  Value3     NaN             V3             x3            NaN
3  Value4  Value4     NaN             V4             x4            NaN
0
mozway On

This is a classical pivot with de-duplication, the logic is however usually performed to deduplicate the columns. Here we de-duplicate the rows.

For that perform a pivot after de-duplication of the columns with cumcount:

out = (df
   .assign(index=df.groupby('Column1').cumcount())
   .pivot(index='index', columns='Column1')
   .rename_axis(index=None)
)

out.columns = out.columns.map('.'.join)

Output:

  Column2.Group1 Column2.Group2 Column2.Group3 Column3.Group1 Column3.Group2 Column3.Group3
0         Value1         Value1         Value1             V1             x1             y1
1         Value2         Value2         Value2             V2             x2             y2
2         Value3         Value3            NaN             V3             x3            NaN
3         Value4         Value4            NaN             V4             x4            NaN

For the original format in your first question:

out = (df
   .assign(index=df.groupby('Column1').cumcount())
   .pivot(index='index', columns='Column1').droplevel(0, axis=1)
   .rename_axis(index=None, columns=None)
)

Output:

   Group1  Group2  Group3 Group1 Group2 Group3
0  Value1  Value1  Value1     V1     x1     y1
1  Value2  Value2  Value2     V2     x2     y2
2  Value3  Value3     NaN     V3     x3    NaN
3  Value4  Value4     NaN     V4     x4    NaN

I would personally recommend to keep the MultiIndex, this makes the hierarchy more explicit:

out = (df
   .assign(index=df.groupby('Column1').cumcount())
   .pivot(index='index', columns='Column1')
   .rename_axis(index=None, columns=[None, None])
)

Output:

  Column2                 Column3              
   Group1  Group2  Group3  Group1 Group2 Group3
0  Value1  Value1  Value1      V1     x1     y1
1  Value2  Value2  Value2      V2     x2     y2
2  Value3  Value3     NaN      V3     x3    NaN
3  Value4  Value4     NaN      V4     x4    NaN