I have 2 dataframes, Requests and Services. Below is an example of the data they contain:
Request Event Dataframe
Client ID Event Reference Event Start Date Event End Date
0 100 100 01/04/2023 04/04/2023
1 100 101 07/06/2023 07/06/2023
2 101 102 01/04/2023 04/04/2023
3 102 103 01/05/2023 10/05/2023
{'Client ID': {0: 100, 1: 100, 2: 101, 3: 102}, 'Event Reference': {0: 100, 1: 101, 2: 102, 3: 103}, 'Event Start Date': {0: '01/04/2023', 1: '07/06/2023', 2: '01/04/2023', 3: '01/05/2023'}, 'Event End Date': {0: '04/04/2023', 1: '07/06/2023', 2: '04/04/2023', 3: '10/05/2023'}}
Services Event Dataframe
Client ID Event Reference Event Start Date Event End Date
0 100 1000 01/02/2022 04/03/2023
1 100 1001 10/04/2023 NaN
2 102 1002 01/04/2023 01/05/2023
3 102 1003 02/05/2023 10/07/2023
{'Client ID': {0: 100, 1: 100, 2: 102, 3: 102}, 'Event Reference': {0: 1000, 1: 1001, 2: 1002, 3: 1003}, 'Event Start Date': {0: '01/02/2022', 1: '10/04/2023', 2: '01/04/2023', 3: '02/05/2023'}, 'Event End Date': {0: '04/03/2023', 1: nan, 2: '01/05/2023', 3: '10/07/2023'}}
Please note that all requests will have end dates, but some services will not, i.e. they are ongoing. Not all clients in the Requests dataframe will have a service.
I need to determine the following - for each request event for a specific client, did they have any service which was active during the request period. If they did then that request needs to go into a "currently open" dataframe. If they did not then that request needs to go into a "new request" dataframe. Based on the above data i would want to see the following result dataframes:
Currently Open Dataframe
Client ID Event Reference Event Start Date Event End Date
0 100 101 07/06/2023 07/06/2023
1 102 103 01/05/2023 10/05/2023
New Request Dataframe
Client ID Event Reference Event Start Date Event End Date
0 100 100 01/04/2023 04/04/2023
1 101 102 01/04/2023 04/04/2023
I am pretty new to python coding so I have been using ChatGPT to give me a leg-up. It came up with some code that I've been tweaking to no avail so far (please note LTS is my name for the services dataframe):
# Intialise an empty dataframe to store results
new_requests = pd.DataFrame()
# Iterate over each row in the requests dataframe
for index, request_row in requests.iterrows():
# Filter relevant rows in the LTS dataframe for the current client
relevant_lts_rows = lts[lts['Client ID'] == request_row['Client ID']]
# Check for overlapping events
overlapping_events = relevant_lts_rows[
(relevant_lts_rows['Event Start Date'] <= request_row['Event End Date']) &
((relevant_lts_rows['Event End Date'] > request_row['Event Start Date']) | pd.isnull(relevant_lts_rows['Event End Date']))]
# If no overlapping events, add current request to the new_requests dataframe
if overlapping_events.empty:
new_requests = pd.concat([new_requests, pd.DataFrame(request_row).transpose()])
# Reset index for new_requests dataframe
new_requests.reset_index(drop=True, inplace=True)
print(new_requests)
Unfortunately it does not seem to extract the correct requests into the new_requests dataframe. Any suggestions would be most appreciated!
Ok the issue has been solved, with help from someone on another forum :) The key problem is that my date fields are not being read as dates but as strings (I think). The answer is to force python to format my event start date and event end date columns as dates when I import my csv data:
date_columns = ["Event Start Date", "Event End Date"] date_formats = "%d/%m/%Y" rbc_data = pd.read_csv(rbc_cld, parse_dates=date_columns, date_format=date_formats)After that it appears that the correct request events are going into the correct dataframes based on checking if dates are overlapping.