I am having a hard time figuring out how to write a recursive CTE for my use-case.
**TEMP** TABLE (CAN BE MODIFIED): Employee_Money_Accounts
-- Primary key on Employer_ID, Account_ID
| id | Employer_ID | Account_ID | get_date (int) |
|---|---|---|---|
| 1 | 1 | 5 | 20210105 |
| 2 | 2 | 8 | 20210104 |
| 3 | 1 | 1145 | 20210105 |
TABLE: Employee_Money_Accounts_Past
-- Primary key: as_of_dte ASC, employer_ID ASC, account_ID ASC
| Employer_ID | account_ID | as_of_dte (int) | money |
|---|---|---|---|
| 1 | 5 | 20201215 | 5.00 |
| 1 | 5 | 20201201 | 8.00 |
| 2 | 8 | 20201201 | 15.00 |
I have millions of records in Employee_Money_Accounts_Past and thousands in Employee_Money_Accounts.
I need to pull the MAX as_of_dte for each account_ID given a specific ceiling.
SELECT EP.Employer_ID, EP.Account_ID, MAX(as_of_dte)
FROM Employee_Money_Accounts_Past EP
INNER JOIN Employee_Money_Accounts EA ON EP.Employer_ID = EA.Employer_ID
AND EP.Account_ID = EA.Account_ID
WHERE EP.as_of_dte <= EA.get_date
GROUP BY EP.Employer_ID, EP.Account_ID
The above query is too slow so I am wanting to write a recursive CTE (not a WHILE loop either) to handle this.
Here is what I have so far -- also super slow! Basically I want to be able to use a recursive CTE to pass one single Employer_ID and Account_ID at a time to the main query since this is fast.
;WITH EmpAccts AS (Employer_ID, Account_ID)
(
SELECT Employer_ID, Account_ID
FROM #Employee_Money_Accounts
UNION ALL
SELECT EA.Employer_ID, EA.Account_ID
FROM EmpeAccts E
INNER JOIN #Employee_Money_Accounts EA ON E.Employer_ID = EA.Employer_ID
AND E.Account_ID = EA.Account_ID
WHERE EA.id = EA.id + 1
)
SELECT EA2.Employer_ID, EA2.Account_ID, MAX(EP.as_of_dte)
FROM EmpAccts EA2
INNER JOIN Employee_Money_Accounts_Past EP ON EA2.Employer_ID = EP.Employer_ID
AND EA2.Account_ID = EP.Account_ID
INNER JOIN Employee_Money_Accounts EMP ON EP.Employer_ID = EMP.Employer_ID
AND EP.Account_ID = EMP.Account_ID
WHERE EP.as_of_dte <= EMP.as_of_dte
GROUP BY EA2.Employer_ID, EA2.Account_ID
Some toughts:
In the table
Employee_Money_Accountsyou have a column namedidthat seems to be an identity column. If that's the case why not make it the primary key column (and maybe add an Alternate key constraint on Employer_ID, Account_ID? Then add a FK reference inEmployee_Money_Accounts_Pastto theidcolumn. I'm not sure but maybe this can help speed up the question.Also when working with millions of rows in a table and doing these kind of questions you could consider adding a columnstore index on the
Employee_Money_Accounts_Past. Columnstore index can increase performance up to 10 times. This is common in datawarehousing and direct analysis in OLTP-systems.