Looking to reindex dates based on a groupby of Grid Cell

58 Views Asked by At

I am attempting to do something similar to the reindexing in Pandas reindex date index by group revisited

except I have one extra level of indexing involved. My data is in the following structure: [Grid Cell, Site, Date, Value], and I want to reindex all sites within a particular grid cell to the same datetimeindex.

import pandas as pd
import numpy as np

data_dict = {'Grid Cell':[1,1,1,1,1,1,1,2,2,2,2,2,2,2],'Site':['A','A','A','A','B','B','B','C','C','C','D','D','D','D'],'Date':['1999-01-01','1999-02-01','1999-03-01','1999-04-01','1999-01-01','1999-02-01','1999-03-01','2000-01-01','2000-02-01','2000-03-01','2000-01-01','2000-02-01','2000-03-01','2000-04-01'],'Value':[-2.45,-3.72,1.34,4.56,0.23,3.26,6.76,-7.45,-6.43,-2.18,-10.72,-8.97,-5.32,-1.73]}
df = pd.DataFrame.from_dict(data_dict)

unique_dates = df.groupby('Grid Cell')['Date'].unique()

df = df.set_index('Date')
idx = pd.MultiIndex.from_product([df.index.unique(), df['Grid Cell'].unique()],names=['Date','Grid Cell'])

new_df = df.set_index("Grid Cell", append=True).reindex(idx, fill_value=np.nan).reset_index(level=1)

I am attempting to achieve a similar result to the linked example, such that all sites in Grid Cell 1, would be reindexed to the following datetimeindex: '1999-01-01', '1999-02-01', '1999-03-01', '1999-04-01', and all sites in Grid Cell 2 would be reindexed to: '2000-01-01', '2000-02-01', '2000-03-01', '2000-04-01', and the resultant dataframe would look like:

data_dict_out = {'Grid Cell':[1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2],'Site':['A','A','A','A','B','B','B','B','C','C','C','D','D','D','D'],'Date':['1999-01-01','1999-02-01','1999-03-01','1999-04-01','1999-01-01','1999-02-01','1999-03-01','1999-04-01','2000-01-01','2000-02-01','2000-03-01','2000-04-01','2000-01-01','2000-02-01','2000-03-01','2000-04-01'],'Value':[-2.45,-3.72,1.34,4.56,0.23,3.26,6.76,np.nan,-7.45,-6.43,-2.18,np.nan,-10.72,-8.97,-5.32,-1.73]}

df_out = pd.DataFrame.from_dict(data_dict_out)

Unfortunately my partial solution currently produces a "ValueError: cannot handle a non-unique multi-index!" error. Any suggestions on what I need to change here?

1

There are 1 best solutions below

0
Andrej Kesely On BEST ANSWER

IIUC, you can do:

def reindex(g):
    idx = pd.MultiIndex.from_product(
        [g["Grid Cell"].unique(), g["Site"].unique(), g["Date"].unique()],
        names=["Grid Cell", "Site", "Date"],
    )
    return g.set_index(["Grid Cell", "Site", "Date"]).reindex(idx, fill_value=np.nan)


df = df.groupby("Grid Cell", group_keys=False).apply(reindex).reset_index()
print(df)

Prints:

    Grid Cell Site        Date  Value
0           1    A  1999-01-01  -2.45
1           1    A  1999-02-01  -3.72
2           1    A  1999-03-01   1.34
3           1    A  1999-04-01   4.56
4           1    B  1999-01-01   0.23
5           1    B  1999-02-01   3.26
6           1    B  1999-03-01   6.76
7           1    B  1999-04-01    NaN
8           2    C  2000-01-01  -7.45
9           2    C  2000-02-01  -6.43
10          2    C  2000-03-01  -2.18
11          2    C  2000-04-01    NaN
12          2    D  2000-01-01 -10.72
13          2    D  2000-02-01  -8.97
14          2    D  2000-03-01  -5.32
15          2    D  2000-04-01  -1.73