Problems transforming a dataset from wide to long with pandas

35 Views Asked by At

Consider the following columns of a dataset:

ONOFF_LEVER_V0, ONOFF_LEVER_V1, LEDD_DOPA_V0_categ_1, LEDD_DOPA_V1_categ_1

I'd like to transform this dataset in the long format with columns:

visit (0 or 1, corresponding to V0 and V1 respectively), ONOFF_LEVER, LEDD_DOPA_categ_1

I was thinking of using the pd.wide_to_long function, but the suffix V0 or V1 in my case is not always at the end of the variable. Any advice? (this is actually a minimal example of my problem, my dataset is huge).

1

There are 1 best solutions below

0
jezrael On

Use Series.str.extract with Series.str.replace for MulitIndex, so possible reshape by DataFrame.stack:

c='ONOFF_LEVER_V0, ONOFF_LEVER_V1, LEDD_DOPA_V0_categ_1, LEDD_DOPA_V1_categ_1'.split(', ')
df = pd.DataFrame(1, columns=c, index=[0,1])

df.columns = [df.columns.str.extract('_(V0|V1)', expand=False), 
              df.columns.str.replace('_(V0|V1)', '', regex=True)]

df = df.stack(0).droplevel(0).rename_axis('visit').reset_index()
print (df)
  visit  LEDD_DOPA_categ_1  ONOFF_LEVER
0    V0                  1            1
1    V1                  1            1
2    V0                  1            1
3    V1                  1            1