Updating EndDate in DimEmployee table

99 Views Asked by At

I have been working on a script to create a Type2 DimEmployee Table. I want it to create a new record every time there is a change to the employees details.

I believe I have this part working correctly. What I am stuck on is correctly updating the EndDate for records that have changed.

It works were records have not changed the EndDate is NULL it also works for records were one month it was this and then the next month it was that (it inserts the correct EndDate value).

However where it is not working is when there has been multiple identical records in the Staging_Employees and then there has been a change. It does not use the correct EndDate value.

P.S the Date feild in the staging table acts as a snapshot date

can anyone help me with this?

Thanks

For example:

[Sample Staging_Employees Data](https://i.stack.imgur.com/GzL8X.png)

[Current DimEmployee Results](https://i.stack.imgur.com/ioxIj.png)

[Expected DimEmployee Results](https://i.stack.imgur.com/zMgsx.png)

As you can see, the EndDate for the first record in the DimEmployee table is now 28/02/2023, the date of the last identical record before a different record appears for StaffNo 4078. For the second record, EndDate remains NULL as there are no more records for StaffNo 4078 after it.

Code Used:

BEGIN TRY
    TRUNCATE TABLE DimEmployee;
END TRY
BEGIN CATCH
    CREATE TABLE DimEmployee (
        DimEmployeeID INT IDENTITY(1,1) PRIMARY KEY,
        StructureID INT,
        StaffNo INT NOT NULL,
        EmployeeID INT,
        Position varchar(20),
        JobTitle VARCHAR(100),
        ContractType VARCHAR(50),
        AverageHoursPerWeek DECIMAL(5,2),
        WeeksPeryr DECIMAL(5,2),
        HoursPerWeek DECIMAL(5,2),
        PublicHolidayZone VARCHAR(50),
        FTE DECIMAL(5,2),
        AnalysisGroup VARCHAR(50),
        EffectiveDate DATE,
        EndDate DATE
    );
END CATCH;

-- Insert new records with changes in the specified fields
WITH ChangedRecords AS (
    SELECT
        s.StaffNo,
        RANK() OVER (ORDER BY s.StaffNo) AS EmployeeID,
        s.Position, 
        s.JobTitle,
        s.ContractType,
        s.AverageHoursPerWeek,
        s.WeeksPeryr,
        s.HoursPerWeek,
        s.PublicHolidayZone,
        s.FTE,
        s.AnalysisGroup,
        d.StructureID,
        s.Date,
        LAG(d.StructureID) OVER (PARTITION BY s.StaffNo ORDER BY s.Date) AS PreviousStructureID,
        LAG(s.Position) OVER (PARTITION BY s.StaffNo ORDER BY s.Date) AS PreviousPosition,
        LAG(s.JobTitle) OVER (PARTITION BY s.StaffNo ORDER BY s.Date) AS PreviousJobTitle,
        LAG(s.ContractType) OVER (PARTITION BY s.StaffNo ORDER BY s.Date) AS PreviousContractType,
        LAG(s.AverageHoursPerWeek) OVER (PARTITION BY s.StaffNo ORDER BY s.Date) AS PreviousAverageHoursPerWeek,
        LAG(s.WeeksPeryr) OVER (PARTITION BY s.StaffNo ORDER BY s.Date) AS PreviousWeeksPeryr,
        LAG(s.HoursPerWeek) OVER (PARTITION BY s.StaffNo ORDER BY s.Date) AS PreviousHoursPerWeek,
        LAG(s.PublicHolidayZone) OVER (PARTITION BY s.StaffNo ORDER BY s.Date) AS PreviousPublicHolidayZone,
        LAG(s.FTE) OVER (PARTITION BY s.StaffNo ORDER BY s.Date) AS PreviousFTE,
        LAG(s.AnalysisGroup) OVER (PARTITION BY s.StaffNo ORDER BY s.Date) AS PreviousAnalysisGroup,
        ROW_NUMBER() OVER (PARTITION BY s.StaffNo, YEAR(s.Date), MONTH(s.Date), DAY(s.Date) ORDER BY s.Date) AS RowNum
    FROM Staging_Employees AS s
    JOIN DimStructure AS d ON
        s.Directorate = d.Directorate AND
        s.Service = d.Service AND
        s.Section = d.Section AND
        s.Team = d.Team
)

INSERT INTO DimEmployee (StructureID, StaffNo, EmployeeID, Position, JobTitle, ContractType, AverageHoursPerWeek, WeeksPeryr, HoursPerWeek, PublicHolidayZone, FTE, AnalysisGroup, EffectiveDate, EndDate)
SELECT StructureID, StaffNo, EmployeeID, Position, JobTitle, ContractType, AverageHoursPerWeek, WeeksPeryr, HoursPerWeek, PublicHolidayZone, FTE, AnalysisGroup, Date, NULL
FROM ChangedRecords
WHERE (PreviousStructureID IS NULL OR PreviousStructureID <> StructureID
    OR PreviousPosition <> Position
    OR PreviousJobTitle <> JobTitle
    OR PreviousContractType <> ContractType
    OR PreviousAverageHoursPerWeek <> AverageHoursPerWeek
    OR PreviousWeeksPeryr <> WeeksPeryr
    OR PreviousHoursPerWeek <> HoursPerWeek
    OR PreviousPublicHolidayZone <> PublicHolidayZone
    OR PreviousFTE <> FTE
    OR PreviousAnalysisGroup <> AnalysisGroup);

-- Update EndDate for old records with changes in the specified fields
WITH ChangedRecords AS (
    SELECT
        s.StaffNo,
        RANK() OVER (ORDER BY s.StaffNo) AS EmployeeID,
        s.Position, 
        s.JobTitle,
        s.ContractType,
        s.AverageHoursPerWeek,
        s.WeeksPeryr,
        s.HoursPerWeek,
        s.PublicHolidayZone,
        s.FTE,
        s.AnalysisGroup,
        d.StructureID,
        s.Date,
        LAG(d.StructureID) OVER (PARTITION BY s.StaffNo ORDER BY s.Date) AS PreviousStructureID,
        LAG(s.Position) OVER (PARTITION BY s.StaffNo ORDER BY s.Date) AS PreviousPosition,
        LAG(s.JobTitle) OVER (PARTITION BY s.StaffNo ORDER BY s.Date) AS PreviousJobTitle,
        LAG(s.ContractType) OVER (PARTITION BY s.StaffNo ORDER BY s.Date) AS PreviousContractType,
        LAG(s.AverageHoursPerWeek) OVER (PARTITION BY s.StaffNo ORDER BY s.Date) AS PreviousAverageHoursPerWeek,
        LAG(s.WeeksPeryr) OVER (PARTITION BY s.StaffNo ORDER BY s.Date) AS PreviousWeeksPeryr,
        LAG(s.HoursPerWeek) OVER (PARTITION BY s.StaffNo ORDER BY s.Date) AS PreviousHoursPerWeek,
        LAG(s.PublicHolidayZone) OVER (PARTITION BY s.StaffNo ORDER BY s.Date) AS PreviousPublicHolidayZone,
        LAG(s.FTE) OVER (PARTITION BY s.StaffNo ORDER BY s.Date) AS PreviousFTE,
        LAG(s.AnalysisGroup) OVER (PARTITION BY s.StaffNo ORDER BY s.Date) AS PreviousAnalysisGroup
    FROM Staging_Employees AS s
    JOIN DimStructure AS d ON
        s.Directorate = d.Directorate AND
        s.Service = d.Service AND
        s.Section = d.Section AND
        s.Team = d.Team
)
UPDATE de
SET EndDate = (
    SELECT MAX(se.Date)
    FROM Staging_Employees se
    WHERE de.StaffNo = se.StaffNo
    AND se.Date <= de.EffectiveDate
)
FROM DimEmployee de
WHERE de.EndDate IS NULL
AND EXISTS (
    SELECT 1
    FROM ChangedRecords cr
    WHERE cr.StaffNo = de.StaffNo
    AND cr.EmployeeID = de.EmployeeID
    AND cr.Date > de.EffectiveDate
    AND (
        cr.PreviousStructureID <> cr.StructureID
        OR cr.PreviousPosition <> cr.Position
        OR cr.PreviousJobTitle <> cr.JobTitle
        OR cr.PreviousContractType <> cr.ContractType
        OR cr.PreviousAverageHoursPerWeek <> cr.AverageHoursPerWeek
        OR cr.PreviousWeeksPeryr <> cr.WeeksPeryr
        OR cr.PreviousHoursPerWeek <> cr.HoursPerWeek
        OR cr.PreviousPublicHolidayZone <> cr.PublicHolidayZone
        OR cr.PreviousFTE <> cr.FTE
        OR cr.PreviousAnalysisGroup <> cr.AnalysisGroup
    )
);

1

There are 1 best solutions below

0
Amira Bedhiafi On BEST ANSWER

Based on what you provided, the issue with updating the EndDate for records that have changed multiple times may be related to the subquery you are using to get the new EndDate since it seems that it only considers the maximum date from the Staging_Employees table that is less than or equal to the EffectiveDate of the record being updated.

This subquery may not take into account the multiple changes that might have occurred within that range.

To resolve this issue, you need to modify the subquery to consider the maximum date that is both less than or equal to the EffectiveDate and also greater than the previous EndDate of the record being updated. This will ensure that you're selecting the correct EndDate for records that have multiple changes.

 -- Update EndDate for old records with changes in the specified fields
    WITH ChangedRecords AS (
        -- Same as before
    )
    UPDATE de
    SET EndDate = (
        SELECT MAX(se.Date)
        FROM Staging_Employees se
        WHERE de.StaffNo = se.StaffNo
        AND se.Date <= de.EffectiveDate
        AND se.Date > COALESCE(de.EndDate, '1900-01-01') -- Consider dates after the previous EndDate
    )
    FROM DimEmployee de
    WHERE de.EndDate IS NULL
    AND EXISTS (
        SELECT 1
        FROM ChangedRecords cr
        WHERE cr.StaffNo = de.StaffNo
        AND cr.EmployeeID = de.EmployeeID
        AND cr.Date > de.EffectiveDate
        AND (
            cr.PreviousStructureID <> cr.StructureID
            OR cr.PreviousPosition <> cr.Position
            OR cr.PreviousJobTitle <> cr.JobTitle
            OR cr.PreviousContractType <> cr.ContractType
            OR cr.PreviousAverageHoursPerWeek <> cr.AverageHoursPerWeek
            OR cr.PreviousWeeksPeryr <> cr.WeeksPeryr
            OR cr.PreviousHoursPerWeek <> cr.HoursPerWeek
            OR cr.PreviousPublicHolidayZone <> cr.PublicHolidayZone
            OR cr.PreviousFTE <> cr.FTE
            OR cr.PreviousAnalysisGroup <> cr.AnalysisGroup
        )
    );