I have a dataset including nested dictionaries that I would like to unpack and form a multiindex dataframe.
The dataframe should have columns for Years with sub columns for the Result Types in each year (yield and quality in this case).
Typical dataset:
datalist= [
{
'trial': 'efr1',
'location': 'aberdeen',
'2010': {'yield': '100', 'quality': '97'},
'2011': {'yield': '90', 'quality': '87'},
'2012': {'yield': '88', 'quality': '90'}
},
{
'trial': 'efr2',
'location': 'bristol',
'2010': {'yield': '88', 'quality': '90'},
'2011': {'yield': '75', 'quality': '82'},
'2012': {'yield': '77', 'quality': '80'}
},
{
'trial': 'axy1',
'location': 'newcastle',
'2010': {'yield': '91', 'quality': '95'},
'2011': {'yield': '93', 'quality': '93'},
'2012': {'yield': '75', 'quality': '97'}
}
]
Using Dataframe.from_dict() produces a table with embedded dictionaries elements which I cannot figure out how to unpack and split to subcolumns.
On the other hand using json_normalize() produces a flat table with compound headings, and I cannot figure out how to to convert that into a multiindex frame with the structure I need...
If I understand correctly, you could use
pandas.json_normalize, then convert theyear.typecolumns to a MultiIndex (withstr.split):Variant:
Output:
Note that setting the first two columns as index is optional, but if you don't the MultiIndex will be:
Intermediate after
pd.json_normalize(datalist):