Resampling Partial Month to Full End of Month - Pandas

30 Views Asked by At

Putting this out there to see if there is a feasible solution for this.

Suppose I have a volumes for every month that say looked liked :

enter image description here

But if the start date for these volumes to be assigned is somewhere in a partial month, lets say in this example its 7/15/2023.

enter image description here

Is there a way to resample the data that sums of the correct volumes till the end of the month ? The answer should look like this.

enter image description here

I tried the simplae resampling , it just shifts the dates to the end of the month, but not in the way I wanted it to be. Any ideas?

import pandas as pd
df = pd.DataFrame({"DATE":["07/15/2023","08/15/2023","09/15/2023"], "VOL":[100, 150, 100]})
df['DATE']=df['DATE'].apply(pd.to_datetime)
df =df.resample('M', on='DATE').sum().reset_index()
print(df)
2

There are 2 best solutions below

0
Andrej Kesely On BEST ANSWER

IIUC, you can do:

import calendar

df = pd.DataFrame(
    {"DATE": ["07/15/2023", "08/15/2023", "09/15/2023"], "VOL": [100, 150, 100]}
)


def get_days_in_month(year, month):
    return calendar.monthrange(year, month)[1]

df["DATE"] = df["DATE"].apply(pd.to_datetime)

days_to_month_end = ((df["DATE"] + pd.offsets.MonthEnd()) - df["DATE"]).dt.days
days_in_month = df["DATE"].apply(lambda x: get_days_in_month(x.year, x.month))

tmp = df["VOL"] * days_to_month_end / days_in_month
df["RESULT"] = (df["VOL"].shift().fillna(0) - tmp.shift().fillna(0)) + tmp
print(df)

Prints:

        DATE  VOL      RESULT
0 2023-07-15  100   51.612903
1 2023-08-15  150  125.806452
2 2023-09-15  100  122.580645
2
Suraj Shourie On

You can get the month end date and get the day difference from that first:

import pandas as pd
df = pd.DataFrame({"DATE":["07/15/2023","08/15/2023","09/15/2023"], "VOL":[100, 150, 100]})

df['DATE'] = pd.to_datetime(df['DATE']) 
# Get Month End date
df['DATE_ME'] = df['DATE'] + pd.tseries.offsets.MonthEnd(0)
df['VOL'] * ((df['DATE_ME'].dt.day - df['DATE'].dt.day ) / df['DATE_ME'].dt.day )

OUTPUT:

0    51.612903
1    77.419355
2    50.000000
dtype: float64