Distributing Values in a Pandas dataframe with different time intervals

125 Views Asked by At

I have a pandas data frame which has the data sorted into quarters like this: ( I am showing a sample data)

 df 

  id    qtr value
  a     Q01 100
  a     Q02 130
  a     Q03 160
  a     Q04 100
  b     Q01 1000
  b     Q02 1300
  b     Q03 1600
  b     Q04 1000

Now the problem what I want to solve is to redistribute the quarter values to months as follows: I want to distribute the quarter value into 3 and assign first two months that value and the remaining to the last month in the quarter. So my output should be like follows

    outdf 


        id  qtr   mnth  value
         a  Q01  M01     30
         a  Q01  M02     30
         a  Q01  M03     40
         a  Q02  M04     40
         a  Q02  M05     40
         a  Q02  M06     50
         a  Q03  M07     50
         a  Q03  M08    50
         a  Q03  M09    60
         a  Q04  M10    30
         a  Q04  M11    30
         a  Q04  M12    40
         b  Q01  M01    300
         b  Q01  M02    300
         b  Q01  M03    400
         b  Q02  M04    400
         b  Q02  M05    400
         b  Q02  M06    500
         b  Q03  M07    500
         b  Q03  M08    500
         b  Q03  M09    600
         b  Q04  M10    300
         b  Q04  M11    300
         b  Q04  M12    400

So what I till now tried is as follows:

I created a mapping for quarters:

   quarters = {'Q01': ['M01','M02','M03'], 'Q02': ['M04','M05','M06'],
        'Q03': ['M07','M08','M09'], 'Q04': ['M10', 'M11', 'M12']}

and the tried to melt and explode the dataframe using this trick:

  out = (df.melt(['id'], value_name='value', var_name='qtr')
          .assign(rev=lambda d: d['value'],#.div(3),
          qtr=lambda d: d['qtr'].str[-2:].map(quarters)
         ).explode('qtr'))

Unfortunately its not doing what I want. Any help as to how to achieve my output will be helpful and appreciated.

3

There are 3 best solutions below

3
Prudhviraj Panisetti On BEST ANSWER

Just tried solving them in a little lengthy way. Hope this approach helps!

For now, I am using some custom roundings but, i hope you can continue from this

Below is the code

import pandas as pd

data = {'id': ['a', 'a', 'a', 'a', 'b', 'b', 'b', 'b'],
        'qtr': ['Q01', 'Q02', 'Q03', 'Q04', 'Q01', 'Q02', 'Q03', 'Q04'],
        'value': [100, 130, 160, 100, 1000, 1300, 1600, 1000]}

df = pd.DataFrame(data)

quarters_to_months = {
    'Q01': ['M01', 'M02', 'M03'],
    'Q02': ['M04', 'M05', 'M06'],
    'Q03': ['M07', 'M08', 'M09'],
    'Q04': ['M10', 'M11', 'M12']
}

result_df = pd.DataFrame(columns=['id', 'qtr', 'mnth', 'value'])

for _, row in df.iterrows():
    qtr = row['qtr']
    months = quarters_to_months[qtr]
    for i, month in enumerate(months):
        if i == 2:
            result_df = pd.concat([result_df,pd.DataFrame({'id': row['id'], 'qtr': qtr, 'mnth': month, 'value': (row['value'] * 0.4)}, index=[0])], ignore_index=True)
        else:
            result_df = pd.concat([result_df,pd.DataFrame({'id': row['id'], 'qtr': qtr, 'mnth': month, 'value': (row['value'] * 0.3)}, index=[0])], ignore_index=True)

result_df.sort_values(by=['id', 'qtr', 'mnth'], inplace=True)

result_df.reset_index(drop=True, inplace=True)



def custom_round(value):
    if(value>100):
        return round(value, -2)
    else:
        return round(value, -1)

result_df['value'] = result_df['value'].apply(custom_round)

print(result_df)

Output

enter image description here

0
sammywemmy On

Summary - convert quarters dictionary to a long form dataframe, merge with df and get the final output, based on the fraction percentages:

other = pd.DataFrame(quarters)
other['fraction'] = [4/13,4/13,5/13]
(df
.merge(other.melt(id_vars='fraction',var_name='qtr',value_name='month'))
.assign(value_ = lambda f: f.value * f.fraction,
        value  = lambda f: f.value_.mask(f.value_ < 100, f.value_.round(-1))
                                   .mask(f.value_ >= 100, f.value_.round(-2))
                                   .astype(int)
       )
.drop(columns=['fraction','value_'])
.sort_values(['id','qtr'], ignore_index=True)
.loc[:, ['id', 'qtr','month','value']]
 )

   id  qtr month  value
0   a  Q01   M01     30
1   a  Q01   M02     30
2   a  Q01   M03     40
3   a  Q02   M04     40
4   a  Q02   M05     40
5   a  Q02   M06     50
6   a  Q03   M07     50
7   a  Q03   M08     50
8   a  Q03   M09     60
9   a  Q04   M10     30
10  a  Q04   M11     30
11  a  Q04   M12     40
12  b  Q01   M01    300
13  b  Q01   M02    300
14  b  Q01   M03    400
15  b  Q02   M04    400
16  b  Q02   M05    400
17  b  Q02   M06    500
18  b  Q03   M07    500
19  b  Q03   M08    500
20  b  Q03   M09    600
21  b  Q04   M10    300
22  b  Q04   M11    300
23  b  Q04   M12    400

Breakdown:

  • Create a dataframe with the quarters dictionary, and attach a fraction value :
other = pd.DataFrame(quarters)
other['fraction'] = [4/13,4/13,5/13]
print(other) 
   Q01  Q02  Q03  Q04  fraction
0  M01  M04  M07  M10  0.307692
1  M02  M05  M08  M11  0.307692
2  M03  M06  M09  M12  0.384615
  • flip other into long form via pd.melt:
other_=other.melt(id_vars='fraction', var_name='qtr', value_name='month')
print(other_)
    fraction  qtr month
0   0.307692  Q01   M01
1   0.307692  Q01   M02
2   0.384615  Q01   M03
3   0.307692  Q02   M04
4   0.307692  Q02   M05
5   0.384615  Q02   M06
6   0.307692  Q03   M07
7   0.307692  Q03   M08
8   0.384615  Q03   M09
9   0.307692  Q04   M10
10  0.307692  Q04   M11
11  0.384615  Q04   M12
  • merge with df:
out = (df
       .merge(other_)
       .assign(value = lambda f: f.value.mul(f.fraction))
       )
print(out)
   id  qtr  value month
0   a  Q01     30   M01
1   a  Q01     30   M02
2   a  Q01     40   M03
3   b  Q01    300   M01
4   b  Q01    300   M02
5   b  Q01    400   M03
6   a  Q02     39   M04
7   a  Q02     39   M05
8   a  Q02     52   M06
9   b  Q02    390   M04
10  b  Q02    390   M05
11  b  Q02    520   M06
12  a  Q03     48   M07
13  a  Q03     48   M08
14  a  Q03     64   M09
15  b  Q03    480   M07
16  b  Q03    480   M08
17  b  Q03    640   M09
18  a  Q04     30   M10
19  a  Q04     30   M11
20  a  Q04     40   M12
21  b  Q04    300   M10
22  b  Q04    300   M11
23  b  Q04    400   M12
  • modify value column to match your expected output - you know your data better, this is one possible solution - you may have to do some tweaks:
out.value = (out
             .value
             .mask(out.value.lt(100), out.value.round(-1))
             .mask(out.value.ge(100), out.value.round(-2))
             .astype(int)
             )
print(out)
   id  qtr  value  fraction month
0   a  Q01     30  0.307692   M01
1   a  Q01     30  0.307692   M02
2   a  Q01     40  0.384615   M03
3   b  Q01    300  0.307692   M01
4   b  Q01    300  0.307692   M02
5   b  Q01    400  0.384615   M03
6   a  Q02     40  0.307692   M04
7   a  Q02     40  0.307692   M05
8   a  Q02     50  0.384615   M06
9   b  Q02    400  0.307692   M04
10  b  Q02    400  0.307692   M05
11  b  Q02    500  0.384615   M06
12  a  Q03     50  0.307692   M07
13  a  Q03     50  0.307692   M08
14  a  Q03     60  0.384615   M09
15  b  Q03    500  0.307692   M07
16  b  Q03    500  0.307692   M08
17  b  Q03    600  0.384615   M09
18  a  Q04     30  0.307692   M10
19  a  Q04     30  0.307692   M11
20  a  Q04     40  0.384615   M12
21  b  Q04    300  0.307692   M10
22  b  Q04    300  0.307692   M11
23  b  Q04    400  0.384615   M1
  • sort dataframe and drop fraction column:
( out
.sort_values(['id', 'qtr'], ignore_index=True)
.drop(columns='fraction')
.loc[:, ['id', 'qtr','month','value']]
)

   id  qtr month  value
0   a  Q01   M01     30
1   a  Q01   M02     30
2   a  Q01   M03     40
3   a  Q02   M04     40
4   a  Q02   M05     40
5   a  Q02   M06     50
6   a  Q03   M07     50
7   a  Q03   M08     50
8   a  Q03   M09     60
9   a  Q04   M10     30
10  a  Q04   M11     30
11  a  Q04   M12     40
12  b  Q01   M01    300
13  b  Q01   M02    300
14  b  Q01   M03    400
15  b  Q02   M04    400
16  b  Q02   M05    400
17  b  Q02   M06    500
18  b  Q03   M07    500
19  b  Q03   M08    500
20  b  Q03   M09    600
21  b  Q04   M10    300
22  b  Q04   M11    300
23  b  Q04   M12    400
0
Bushmaster On

As an alternative, you can use:

quarters = pd.DataFrame({'qtr': ['Q01','Q02','Q03',"Q04"], 'month': [['M01','M02','M03'],['M04','M05','M06'],['M07','M08','M09'],['M10', 'M11', 'M12']]})
dfx = df.merge(quarters).explode("month")
dfx["month_id"] = dfx.groupby(["id","qtr"]).cumcount() + 1
dfx
'''
  id  qtr  value month  month_id
0  a  Q01    100   M01         1
0  a  Q01    100   M02         2
0  a  Q01    100   M03         3
1  b  Q01   1000   M01         1
1  b  Q01   1000   M02         2
1  b  Q01   1000   M03         3
2  a  Q02    130   M04         1
2  a  Q02    130   M05         2
2  a  Q02    130   M06         3
3  b  Q02   1300   M04         1
3  b  Q02   1300   M05         2
3  b  Q02   1300   M06         3
4  a  Q03    160   M07         1
4  a  Q03    160   M08         2
4  a  Q03    160   M09         3
5  b  Q03   1600   M07         1
5  b  Q03   1600   M08         2
5  b  Q03   1600   M09         3
6  a  Q04    100   M10         1
6  a  Q04    100   M11         2
6  a  Q04    100   M12         3
7  b  Q04   1000   M10         1
7  b  Q04   1000   M11         2
7  b  Q04   1000   M12         3
'''

Now we have month numbers. Let's create a dict that multiplies the first two months by 4/13 and the third month by 5/13.

map_dict = {1:4/13,2:4/13,3:5/13}
dfx["value"] = dfx[["month_id","value"]].apply(lambda x: map_dict[x.month_id] * x.value,axis=1)
dfx["value"] = np.where(dfx["value"] < 100,dfx["value"].round(-1),dfx["value"].round(-2))
dfx = dfx.drop(["month_id"],axis=1).sort_values('qtr')

Out:

    id  qtr value   month
0   a   Q01 30.0    M01
0   a   Q01 30.0    M02
0   a   Q01 40.0    M03
1   b   Q01 300.0   M01
1   b   Q01 300.0   M02
1   b   Q01 400.0   M03
3   b   Q02 400.0   M05
3   b   Q02 400.0   M04
3   b   Q02 500.0   M06
2   a   Q02 40.0    M05
2   a   Q02 40.0    M04
2   a   Q02 50.0    M06
4   a   Q03 50.0    M07
4   a   Q03 50.0    M08
4   a   Q03 60.0    M09
5   b   Q03 500.0   M07
5   b   Q03 500.0   M08
5   b   Q03 600.0   M09
7   b   Q04 300.0   M11
6   a   Q04 30.0    M10
6   a   Q04 30.0    M11
6   a   Q04 40.0    M12
7   b   Q04 300.0   M10
7   b   Q04 400.0   M12