ValueError: Cannot create a NumPy datetime other than NaT with generic units

710 Views Asked by At

I have generated this df

PredictionTargetDateEOM PredictionTargetDateBOM DayAfterTargetDateEOM   business_days
0   2018-12-31              2018-12-01              2019-01-01              20
1   2019-01-31              2019-01-01              2019-02-01              21
2   2019-02-28              2019-02-01              2019-03-01              20
3   2018-11-30              2018-11-01              2018-12-01              21
4   2018-10-31              2018-10-01              2018-11-01              23
              ...   ... ... ... ...
172422  2020-10-31          2020-10-01              2020-11-01              22
172423  2020-11-30          2020-11-01              2020-12-01              20
172424  2020-12-31          2020-12-01              2021-01-01              22
172425  2020-09-30          2020-09-01              2020-10-01              21
172426  2020-08-31          2020-08-01              2020-09-01              21

with this code:

predicted_df['PredictionTargetDateBOM'] = predicted_df.apply(lambda x: pd.to_datetime(x['PredictionTargetDateEOM']).replace(day=1), axis = 1) #Get first day of the target month
predicted_df['PredictionTargetDateEOM'] = pd.to_datetime(predicted_df['PredictionTargetDateEOM'])
predicted_df['DayAfterTargetDateEOM'] = predicted_df['PredictionTargetDateEOM'] + timedelta(days=1) #Get the first day of the month after target month. i.e. M+2
predicted_df['business_days_bankers'] = predicted_df.apply(lambda x: np.busday_count(x['PredictionTargetDateBOM'].date(), x['DayAfterTargetDateEOM'].date(), holidays=[list(holidays.US(years=x['PredictionTargetDateBOM'].year).keys())[index] for index in [list(holidays.US(years=x['PredictionTargetDateBOM'].year).values()).index(item) for item in rocket_holiday_including_observed if item in list(holidays.US(years=x['PredictionTargetDateBOM'].year).values())]] ), axis = 1) #Count number of business days of the target month

That counts the number of business days in the month of the PredictionTargetDateEOM column based on Python's holiday package, which is a dictionary that includes the following holidays:

2022-01-01 New Year's Day
2022-01-17 Martin Luther King Jr. Day
2022-02-21 Washington's Birthday
2022-05-30 Memorial Day
2022-06-19 Juneteenth National Independence Day
2022-06-20 Juneteenth National Independence Day (Observed)
2022-07-04 Independence Day
2022-09-05 Labor Day
2022-10-10 Columbus Day
2022-11-11 Veterans Day
2022-11-24 Thanksgiving
2022-12-25 Christmas Day
2022-12-26 Christmas Day (Observed)

However, I would like to replicate the business day count but instead use this list called rocket_holiday as the reference for np.busday_count():

["New Year's Day",
 'Martin Luther King Jr. Day',
 'Memorial Day',
 'Independence Day',
 'Labor Day',
 'Thanksgiving',
 'Christmas Day',
 "New Year's Day (Observed)",
 'Martin Luther King Jr. Day (Observed)',
 'Memorial Day (Observed)',
 'Independence Day (Observed)',
 'Labor Day (Observed)',
 'Thanksgiving (Observed)',
 'Christmas Day (Observed)']

So I've added this line predicted_df['business_days_rocket'] = predicted_df.apply(lambda x: np.busday_count(x['PredictionTargetDateBOM'].date(), x['DayAfterTargetDateEOM'].date(), holidays=[rocket_holiday]), axis = 1)

But I get the ValueError listed in the title of this question. I think the problem is that the first list is a dictionary with the dates of those holidays, so I need to write a function that could generate those dates for the holidays of the second list in a dynamic fashion based on year, and convert that list into a dictionary. Is there a way to do that with Python's holiday package so that I don't have to hard-code the dates in?

0

There are 0 best solutions below