Transform 4 level Dictionary to DataFrame

58 Views Asked by At

I have this 4 level dict of ice cream sales in different countries:

Import pandas as pd
from operator import add

d1={
'Sweden':{'jan':{
    
'0-5': 5,
'6-8': 8,
'9-10':19,
'11-15': 14,
'16-18': 24},
    
'march':{        
    
'0-5': 5,
'6-8': 18,
'9-10': 9,
'11-15': 14,
'16-18': 24},
      
'feb':{        
'0-5': 5,
'6-8': 7,
'9-10': 3,
'11-15': 14,
'16-18': 24}},

'Norway':{'jan':{ 
'0-5': 25,
'6-8': 8,
'9-10': 45,
'11-15': 14,
'16-18': 24},
'march':{        
'0-5': 2,
'6-8': 8,
'9-10': 88,
'11-15': 14,
'16-18': 24},
      
'feb':{        
'0-5': 5,
'6-8': 48,
'9-10': 9,
'11-15': 39,
'16-18': 24}}

}

I can unpack it to my desired DataFrame using a nested for loop:

colnames=['country','month','age','revenue']
lst=[]
for i in d1.keys():
    for j in d1[i].keys():
        revenue=list(d1[i][j].items())
        l1=list(map(add,[(i,j)]*5,revenue))
        lst=lst+l1

df=pd.DataFrame.from_records(lst,columns=colnames)

to a shape (30,4) DataFrame.

Does pandas have a built in function for doing this in a nicer/faster way without for loops? What is the fastest way to do this?

1

There are 1 best solutions below

0
mozway On

You can use functions to reshape, but it's likely less efficient:

out = (pd.concat({k: pd.DataFrame(d).rename_axis(index='age', columns='month')
                  for k, d in d1.items()},
                 names=['country'])
         .stack().reset_index(name='revenue')
      )

Or:

s = pd.DataFrame(d1).stack()
out = (pd.DataFrame(s.tolist(), index=s.index).stack()
         .rename_axis(['month', 'country', 'age']).reset_index(name='revenue')
      )

A variant of your code using a dictionary comprehension, which is faster than pandas:

out = pd.DataFrame([(k1, k2, k3, v3) for k1, d in d1.items()
                    for k2, d2 in d.items()
                    for k3, v3 in d2.items()],
                    columns=['country', 'month', 'age', 'revenue'])

Output:

   country  month    age  revenue
0   Sweden    jan    0-5        5
1   Sweden    jan    6-8        8
2   Sweden    jan   9-10       19
3   Sweden    jan  11-15       14
4   Sweden    jan  16-18       24
5   Sweden  march    0-5        5
6   Sweden  march    6-8       18
7   Sweden  march   9-10        9
8   Sweden  march  11-15       14
9   Sweden  march  16-18       24
10  Sweden    feb    0-5        5
11  Sweden    feb    6-8        7
12  Sweden    feb   9-10        3
13  Sweden    feb  11-15       14
14  Sweden    feb  16-18       24
15  Norway    jan    0-5       25
16  Norway    jan    6-8        8
17  Norway    jan   9-10       45
18  Norway    jan  11-15       14
19  Norway    jan  16-18       24
20  Norway  march    0-5        2
21  Norway  march    6-8        8
22  Norway  march   9-10       88
23  Norway  march  11-15       14
24  Norway  march  16-18       24
25  Norway    feb    0-5        5
26  Norway    feb    6-8       48
27  Norway    feb   9-10        9
28  Norway    feb  11-15       39
29  Norway    feb  16-18       24

Timings:

# dictionary comprehension
148 µs ± 4.28 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

# pandas reshaping (1)
1.54 ms ± 21.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

# pandas reshaping (2)
1.43 ms ± 27.7 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)