Date-Based Record Partitioning

34 Views Asked by At

I'm working on a data integration task where I need to partition records from two dataframes based on their dates. Specifically, I have two dataframes "fd" and "ptmt" containing records with date information (UPI_Date(column) for "fd" and Transaction_Date_rl(column) for "ptmt"). My objective is to partition these records into different subsets (EFGH and ABCD) based on the dates and the number of records associated with each date.

ABCD Subset: This subset should contain records where the count of records for a particular date is equal between the two dataframes (fd and ptmt). These records are considered to have matching counts for the respective dates in both dataframes.

EFGH Subset:This subset should contain records where The count of records for a particular date is unequal between the two dataframes (fd and ptmt), indicating a discrepancy in the number of records for that date and also when date exists in one dataframe but not in the other and vice versa indicating missing records for that date in one of the dataframes should be put into EFGH

This was my original code:

case_indices["EFGH"]["fd"].extend(
    fd.index[~(fd["UPI_Date"].isin(ptmt['Transaction_Date_rl']))].to_list()
)

case_indices["EFGH"]["ptmt"].extend(
    ptmt.index[~(ptmt["Transaction_Date_rl"].isin(fd["UPI_Date"]))].to_list()
)


case_indices["ABCD"]["fd"].extend(
    fd.index[fd["UPI_Date"].isin(ptmt['Transaction_Date_rl'])].to_list()
)

case_indices["ABCD"]["ptmt"].extend(
    ptmt.index[ptmt["Transaction_Date_rl"].isin(fd["UPI_Date"])].to_list()
)

But this was not Robust Enough and was just checking if a date exists in one dataframe but not in the other and vice versa indicating missing records for that date in one of the dataframes and was put into EFGH case and if they existed on either dataframes they were put into ABCD

This is what i tried

case_indices["ABCD"]["fd"].extend(
  fd[fd['UPI_Date'].isin(fd['UPI_Date'].value_counts().eq(ptmt['Transaction_Date_rl'].value_counts())[lambda x: x].index)].index
)

case_indices["ABCD"]["ptmt"].extend(
ptmt[ptmt['Transaction_Date_rl'].isin(fd['UPI_Date'].value_counts().eq(ptmt['Transaction_Date_rl'].value_counts())[lambda x: x].index)].index
)


case_indices["EFGH"]["fd"].extend(
    fd.index[~fd.index.isin(case_indices["ABCD"]["fd"])].tolist()
)

case_indices["EFGH"]["ptmt"].extend(
    ptmt.index[~ptmt.index.isin(case_indices["ABCD"]["ptmt"])].tolist()
)

Could someone please provide guidance on how to correctly implement this partitioning logic using Pandas?

0

There are 0 best solutions below