I am pivoting a dataframe to get it into long form. The data is by month and year. But not all months are present.
- How do I add the columns for the missing months and fill those with ZERO?
- How do I merge the top two column identifiers (Year and Month) into one Date identifier for the month?
The executable code is below.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Year':[2022,2022,2023,2023,2024,2024],
'Month':[1,12,11,12,1,1],
'Code':[None,'John Johnson',np.nan,'John Smith','Mary Williams','ted bundy'],
'Unit Price':[np.nan,200,None,56,75,65],
'Quantity':[1500, 140000, 1400000, 455, 648, 759],
'Amount':[100, 10000, 100000, 5, 48, 59],
'Invoice':['soccer','basketball','baseball','football','baseball','ice hockey'],
'energy':[100.,100,100,54,98,3],
'Category':['alpha','bravo','kappa','alpha','bravo','bravo']
})
index_to_use = ['Category','Code','Invoice','Unit Price']
values_to_use = ['Amount','Quantity']
columns_to_use = ['Year','Month']
df2 = df.pivot_table(index=index_to_use,
values=values_to_use,
columns=columns_to_use)
The solution should be able to identify years in the data and add columns for missing months with ZERO or nan. In the data above for example we have 3 years, 2022, 2023 and 2024 but we have data only for Dec in 2022 and 2023 and Jan in 2024. The output dataframe should have Jan to Dec for all three years 2022, 2023 and 2024 with ZERO or nan in those cells where the original dataframe did not have data?
Code
out: