Update chain of rows in which each row will affect next one through a computed column

170 Views Asked by At

In a banking DB we want to hold the transactions and remainder amount of accounts after each transaction in a table named 'Operation' like this:

CREATE TABLE Operation (
    Id int PRIMARY KEY,
    Change int,
    PreviousId int null FOREIGN KEY REFERENCES Operation(id),
    PreviousAmount int null,
    NewAmount AS (COALESCE(PreviousAmount,0) + Change) PERSISTED
);

(for simplicity I removed some columns like AccountId and Date)

PreviousId holds the id of last transaction of this account and will form a chain of transactions.

With this data:

INSERT INTO Operation (Id, Change, PreviousId, PreviousAmount)
VALUES
(1, 1, null, null),
(2, 2, 1, 1),
(3, 3, 2, 3),
(4, 4, 3, 5);

So far PreviousAmount and NewAmount has valid data:

id  change  previousId  previousAmount  newAmount
-------------------------------------------------
1   1       NULL        NULL            1
2   2       1           1               3
3   3       2           3               6
4   4       3           6               10 

let's say we update first row "change":

Update Operation set Change=10 where id=1

I want a db level solution (like trigger) to update all PreviousAmount in the chain of rows so each row holds the updated "NewAmount" of previous row within its "PreviousAmount"

[Edit]

There is a solution to avoid computed column and compute NewAmount using window function in a trigger, but the reason I choose computed column is the reliability of validating non-negative newAmount.

using a window function requires that first an update take place and then another query update the newAmount column and see: if it's negative, rollback the transaction. but using a computed column and a db constraint there is no need for the second query and a transaction rollback as the db engine will do the magic.

2

There are 2 best solutions below

1
T N On

--- I will leave this up for now, not it appears from continued discussion that we are not actually dealing with a linked list, so the below may not be an appropriate solution. ---

If after considering the advice from all of the comments above, you still prefer to store the running sums, this can be done with the help of a recursive Common Table Expression (CTE).

Given the id of the inserted or updated row, the CTE can follow the links down the chain and calculate the running totals along the way. This can then be fed into an UPDATE statement that applies the changes.

For repeated use, this could be wrapped up in a stored procedure. The following procedure will accept the id of the updated record or a NULL id to indicate that all values should be recalculated from the root (rows with previousId = NULL).

CREATE Procedure UpdateChain
    @StartingId INT
AS
    WITH CTE_Operation AS (
        -- Any root operation, when  @StartingId IS NULL
        SELECT
            O.*,
            previousAmountCalc = NULL,
            newAmountCalc = O.change
        FROM Operation O
        WHERE @StartingId IS NULL AND O.previousId IS NULL
        UNION ALL
        -- Specific operation row, when  @StartingId IS NOT NULL
        SELECT -- Initial explicitly selected operation
            O.*,
            previousAmountCalc = OP.newAmount,
            newAmountCalc = ISNULL(OP.newAmount, 0) + O.change
        FROM Operation O
        LEFT JOIN Operation OP ON OP.id = O.previousId
        WHERE O.id = @StartingId
        UNION ALL
        -- Recursively follow the links
        SELECT
            O.*,
            previousAmountCalc = C.newAmountCalc,
            newAmountCalc = C.newAmountCalc + O.change
        FROM CTE_Operation C
        JOIN Operation O ON O.previousId = C.id
    )
    UPDATE O
    SET
        previousAmount = previousAmountCalc,
        newAmount = newAmountCalc
    FROM CTE_Operation C
    JOIN Operation O ON O.id = C.id

-- End of procedure

It is worth noting that the above recursive CTE does not have any logic to detect cyclic (infinite loop) link arrangements and will error out if the number of recursions exceeds 100 (controlled by the MAXRECURSION option).

Use case 1 - Initialization:

INSERT Operation (id, change, previousId)
VALUES
    (1, 1, null),
    (2, 2, 1),
    (3, 3, 2),
    (4, 4, 3)

-- Refresh all (starting with OPerations with previousId = NULL)
EXEC UpdateChain NULL

Results:

id change previousId previousAmount newAmount
1 1 null null 1
2 2 1 1 3
3 3 2 3 6
4 4 3 6 10

Use case 2 - Updated row:

UPDATE Operation
SET Change = 10
WHERE id = 1

-- Refresh everything from id = 1 and down the chain
EXEC UpdateChain 1

Results:

id change previousId previousAmount newAmount
1 10 null null 10
2 2 1 10 12
3 3 2 12 15
4 4 3 15 19

Use case 3 - Inserted row:

-- Insert new operation after id = 2
INSERT Operation (id, change, previousId)
VALUES (5, 5, 2)

-- Adjust the backlink for the following operation to point to the new entry
UPDATE Operation
SET previousId = 5
WHERE id = 3

-- Refresh everything from the prior id = 2 and down the chain
EXEC UpdateChain 2

Results:

id change previousId previousAmount newAmount
1 10 null null 10
2 2 1 10 12
5 5 2 12 17
3 3 5 17 20
4 4 3 20 24

See this db<>fiddle for a demonstration of the above.

5
T N On

Update: See the trigger based solution at the end of this post

First, your application business logic should be the first line of defense for enforcing data validity, database constraints should be used to enforce database integrity if the application fails to do its job.

I believe you can accomplish your integrity goals the implementing a table constraint that calls a function that validates the series of values for the current account. No extra columns are needed.

Something like:

CREATE FUNCTION ValidateOperationSeries (
    @AccountNo INT,
    @Change NUMERIC(19,2) -- Ignored, but must be present to catch column updates
) RETURNS BIT -- 1 = valid, 0 = invalid
AS
BEGIN
    DECLARE @MinimumAmount NUMERIC(19,2) = (
        SELECT MIN(CummulativeAmount)
        FROM (
            SELECT SUM(O.Change) OVER(ORDER BY O.OperationId) AS CummulativeAmount
            FROM Operation O
            WHERE O.AccountNo = @AccountNo
        ) S
    )
             
    DECLARE @IsValid BIT = CASE WHEN @MinimumAmount < 0 THEN 0 ELSE 1 END

    RETURN @IsValid
END

Together with a table constraint:

    CONSTRAINT CK_Operation_ValidateOperationSeries
        CHECK (dbo.ValidateOperationSeries(AccountNo, Change) = 1)

This assumes that you have a column that unambiguously defines the row ordering like id or some strictly ascending timestamp or transaction-number column. I also added an AccountNo column, to show how the constraint only considers the rows for the updated account.

Although the function does not directly use the Change parameter, it must be present in the check constraint expression to ensure that the constraint fires if that column is updated.

There are still a few logic gaps:

  1. If a row is deleted, the check constraint will not execute.
  2. If something else changes, like AccountNo, the check constraint may not revalidate all affected data. These cases could be left to the application to validate, but if really needed, triggers could still be implemented to call the validation function and take appropriate action if needed.

See this db<>fiddle for a demo that includes insert and update actions with a mix of pass and fail scenarios.

ADDENDUM:

Perhaps a cleaner solution that should catch all cases would be a trigger that identifies and validates all affected accounts.

Something like:

CREATE TRIGGER TR_Operation_ValidateOperationSeries
    ON Operation 
    AFTER INSERT, UPDATE, DELETE
AS
    DECLARE @FailedAccountNo INT = (
        SELECT TOP 1 A.AccountNo
        FROM (
            -- Get affected account numbers where some key data has changed
            SELECT DISTINCT A.AccountNo
            FROM INSERTED I
            FULL OUTER JOIN DELETED D ON D.OperationId = I.OperationId
            CROSS APPLY (
                SELECT I.AccountNo WHERE I.AccountNo IS NOT NULL
                UNION ALL
                SELECT D.AccountNo WHERE D.AccountNo IS NOT NULL
            ) A
            WHERE (
                D.OperationId IS NULL    -- Any inserted row
                OR I.OperationId IS NULL -- Any deleted row
                OR UPDATE(AccountNo)     -- PARTITION BY key changed
                OR UPDATE(OperationId)   -- The ORDER BY column changed
                OR UPDATE(Change)        -- Tracked value changed
            )
        ) A
        WHERE dbo.ValidateOperationSeries(A.AccountNo) = 0 -- Validation Fails
    )

    IF @FailedAccountNo IS NOT NULL
    BEGIN
        DECLARE @ErrorMessage VARCHAR(1000) =
            CONCAT('Operation series failed validation for account ',
                    @FailedAccountNo,
                   ' in TR_Operation_ValidateOperationSeries.');
        THROW 50000, @ErrorMessage, 1
    END

The above will extract all affected accounts and call the previously defined validation function. (This time the extra @Change parameter is eliminated.) Affected accounts include those from any inserted or deleted rows, rows where the Change column is updated, and both the before and after values if the AccountNo column changes.

For updates, account (re-)validation will only be performed is one of the key columns is changed. The above trigger must be set up to recognize any columns that affect the f=validation results. These include:

  • The PARTITION BY column - AccountNo in the test scenario
  • The ORDER BY column - OperationId in the test scenario, but will likely be a different timestamp or other strictly ascending transaction number of some type.
  • The data being tracked - Change in the test scenario

See this db<>query for the trigger-based demo.