I would need a help for the sql syntax which would give me the total number of employees for the selected period in the SCD2 table. Examples are provided below - the thing is that i get the inconsistent results for the totals, i guess due to the fact that i'm using "FlagIsCurrent" variable. Since the variable "ValidFrom" and "ValidTo" refer to the period when somebody worked in the specific store or quit, would it be possible to get the consistent results with those variables? "FlagIsCurrent" only refers to the current state of affairs and i want to go "into past".
Any ideas?
Table looks similar like this:
| EmployeeID | StoreID | EmploymentDate | TerminationDate | ValidFrom | ValidTo | FlagIsCurrent | LoadDateTime |
|---|---|---|---|---|---|---|---|
| e01 | 5 | 2010-10-01 | 1900-01-01 | 2010-10-01 | 9999-12-31 | 1 | 2023-01-05 23:00:27.543 |
| e02 | 4 | 2022-10-10 | 1900-01-01 | 2022-10-10 | 2022-12-10 | 0 | 2023-01-01 23:00:27.543 |
| e02 | 5 | 2022-10-10 | 1900-01-01 | 2022-12-13 | 9999-12-31 | 1 | 2023-01-05 23:00:27.543 |
| e03 | 3 | 2023-02-20 | 1900-01-01 | 2023-02-20 | 9999-12-31 | 1 | 2023-01-05 23:00:27.543 |
| e04 | 3 | 2022-08-25 | 1900-01-01 | 2022-08-25 | 2022-12-04 | 0 | 2023-01-01 23:00:27.543 |
| e04 | 3 | 2022-08-25 | 2022-12-05 | 2022-12-05 | 9999-12-31 | 1 | 2023-01-05 23:00:27.543 |
Hires for the selected period (works ok):
SELECT COUNT (DISTINCT [EmployeeID])
FROM [import].[hr].[Employees]
WHERE [EmploymentDate]>=MIN(Calendar[Date]) AND [EmploymentDate]<=MAX(Calendar[Date])
...or for the specific month:
SELECT COUNT (DISTINCT [EmployeeID])
FROM [import].[hr].[Employees]
WHERE [EmploymentDate]>='2023-02-01' AND [EmploymentDate]<='2023-02-28'
Terminations for the selected period (works ok):
SELECT COUNT (DISTINCT [EmployeeID])
FROM [import].[hr].[Employees]
WHERE [TerminationDate]<>'1900-01-01' AND [TerminationDate]>=MIN(Calendar[Date]) AND [TerminationDate]<=MAX(Calendar[Date])
...or for the specific month:
SELECT COUNT (DISTINCT [EmployeeID])
FROM [import].[hr].[Employees]
WHERE [TerminationDate]<>'1900-01-01' AND [TerminationDate]>='2023-02-01' AND [TerminationDate]<='2023-02-28'
Totals for the selected period (inconsistent results):
SELECT COUNT (DISTINCT [EmployeeID])
FROM [import].[hr].[Employees]
WHERE [FlagIsCurrent]='1'
AND
(
([TerminationDate]='1900-01-01' AND [EmploymentDate]<=MAX(Calendar[Date])
OR
([EmploymentDate]<=MAX(Calendar[Date]) AND [TerminationDate]>=MIN(Calendar[Date])
))
...or for the specific month:
SELECT COUNT (DISTINCT [EmployeeID])
FROM [import].[hr].[Employees]
WHERE [FlagIsCurrent]='1'
AND
(
([TerminationDate]='1900-01-01' AND [EmploymentDate]<='2023-02-28'
OR
([EmploymentDate]<='2023-02-28') AND [TerminationDate]>='2023-02-01'
))
UPDATE
Is it safe to assume that the total number of employees for the selected period would be the following:
SELECT COUNT ([EmployeeID]) FROM [import].[hr].[Employees] WHERE [TerminationDate] IS '1900-01-01' AND [ValidFrom]<=MAX(Calendar[Date]) AND [ValidTo]>=MIN(Calendar[Date])
...or for the last February that would be:
SELECT COUNT ([EmployeeID]) FROM [import].[hr].[Employees] WHERE [TerminationDate] IS '1900-01-01' AND [ValidFrom]<='2023-02-28' AND [ValidTo]>='2023-02-01'