How can I transform a DataFrame using pivot and melt in Python Pandas?

64 Views Asked by At

I have a DataFrame

         date     price num_floors    house
1  2023-01-01  94.30076          3        A
2  2023-01-01  95.58771          2        B
3  2023-01-02 102.78559          1        C
4  2023-01-03  93.29053          3        D

and I want to change it, so that each column contains the prices and num_floors for all houses for a given date. For one column, the first two rows of a column refer to the first house, the next two to the second house. The remaining entries without data are filled with the missing value NaN, like this:

  2023-01-01    2023-01-02  2023-01-03
1   94.30076     102.78559    93.29053
2          3             1           3         
3   95.58771            NA          NA
4          2            NA          NA

I succeed using R:

df_trans <- df %>%
    pivot_longer(-date) %>%
    mutate(index=row_number(), .by = date) %>%
    pivot_wider(id_cols = index, names_from = date, values_from = value) %>%
    select(-index)

but when I try with python, it does not work as I want:

df_trans = df.melt(id_vars='date')
df_trans['n'] = df_trans.groupby('date').cumcount() + 1
df_trans = df_trans.pivot(index='n', columns='date', values='value')

3

There are 3 best solutions below

0
Andrej Kesely On

Try:

df = df.drop(columns='house')
df['tmp'] = df.groupby('date').cumcount()
df = df.set_index(['date', 'tmp']).stack().unstack('date').reset_index(drop=True)
df.columns.name = None

print(df)

Prints:

   2023-01-01  2023-01-02  2023-01-03
0    94.30076   102.78559    93.29053
1     3.00000     1.00000     3.00000
2    95.58771         NaN         NaN
3     2.00000         NaN         NaN
0
Rawson On
pd.concat([j.set_index("date")[["price", "num_floors"]].T \
    for i, j in df.groupby(df.groupby("date").cumcount())])

In sections parts:

  1. Group by date and cumulative count, and group by this
  2. For each group, manipulate to desired output
  3. Concatenate groups
0
PaulS On

Another possible solution:

(pd.DataFrame(np.vstack([np.vstack([[x, y], [x, z]]) for x, y, z in
    zip(df['date'], df['price'], df['num_floors'])]))
 .pivot(columns=0, values=1).rename_axis(None, axis=1)
 .apply(lambda x: x.dropna(ignore_index=True)))

Alternatively,

(df.assign(
    price = [[x,y] for x,y in zip(df['price'], df['num_floors'])])
 .pivot(columns='date', values='price')
 .apply(lambda x: x.explode(ignore_index=True))
 .rename_axis(None, axis=1)
 .apply(lambda x: x.dropna(ignore_index=True)))

Output:

  2023-01-01 2023-01-02 2023-01-03
0   94.30076  102.78559   93.29053
1          3          1          3
2   95.58771        NaN        NaN
3          2        NaN        NaN