Writing a recursive CTE

50 Views Asked by At

I am having a hard time figuring out how to write a recursive CTE for my use-case.

**TEMP** TABLE (CAN BE MODIFIED): Employee_Money_Accounts 
-- Primary key on Employer_ID, Account_ID
id Employer_ID Account_ID get_date (int)
1 1 5 20210105
2 2 8 20210104
3 1 1145 20210105
TABLE: Employee_Money_Accounts_Past 
-- Primary key: as_of_dte ASC, employer_ID ASC, account_ID ASC
Employer_ID account_ID as_of_dte (int) money
1 5 20201215 5.00
1 5 20201201 8.00
2 8 20201201 15.00

I have millions of records in Employee_Money_Accounts_Past and thousands in Employee_Money_Accounts.

I need to pull the MAX as_of_dte for each account_ID given a specific ceiling.

SELECT EP.Employer_ID, EP.Account_ID, MAX(as_of_dte) 
FROM   Employee_Money_Accounts_Past EP
INNER JOIN Employee_Money_Accounts EA ON EP.Employer_ID = EA.Employer_ID 
                                     AND EP.Account_ID  = EA.Account_ID
WHERE EP.as_of_dte <= EA.get_date
GROUP BY EP.Employer_ID, EP.Account_ID

The above query is too slow so I am wanting to write a recursive CTE (not a WHILE loop either) to handle this.

Here is what I have so far -- also super slow! Basically I want to be able to use a recursive CTE to pass one single Employer_ID and Account_ID at a time to the main query since this is fast.

;WITH EmpAccts AS (Employer_ID, Account_ID)
(
    SELECT Employer_ID, Account_ID
    FROM #Employee_Money_Accounts
    UNION ALL
    SELECT EA.Employer_ID, EA.Account_ID
    FROM EmpeAccts E
    INNER JOIN #Employee_Money_Accounts EA ON E.Employer_ID = EA.Employer_ID
                                          AND E.Account_ID = EA.Account_ID
    WHERE EA.id = EA.id + 1
)
SELECT EA2.Employer_ID, EA2.Account_ID, MAX(EP.as_of_dte)
FROM EmpAccts EA2
INNER JOIN Employee_Money_Accounts_Past EP ON EA2.Employer_ID = EP.Employer_ID 
                                          AND EA2.Account_ID = EP.Account_ID
INNER JOIN Employee_Money_Accounts EMP ON EP.Employer_ID = EMP.Employer_ID
                                      AND EP.Account_ID = EMP.Account_ID
WHERE EP.as_of_dte <= EMP.as_of_dte
GROUP BY EA2.Employer_ID, EA2.Account_ID
1

There are 1 best solutions below

1
Andreas Sundström On

Some toughts:

In the table Employee_Money_Accounts you have a column named id that seems to be an identity column. If that's the case why not make it the primary key column (and maybe add an Alternate key constraint on Employer_ID, Account_ID? Then add a FK reference in Employee_Money_Accounts_Past to the id column. I'm not sure but maybe this can help speed up the question.

Also when working with millions of rows in a table and doing these kind of questions you could consider adding a columnstore index on the Employee_Money_Accounts_Past. Columnstore index can increase performance up to 10 times. This is common in datawarehousing and direct analysis in OLTP-systems.