How to prevent insertion of cyclic reference in SQL

2.3k Views Asked by At

I have the following table:

create table dbo.Link
(
    FromNodeId int not null,
    ToNodeId int not null
)

Rows in this table represent links between nodes.

I want to prevent inserts or updates to this table from creating a cyclic relationship between nodes.

So if the table contains:

(1,2)
(2,3)

it should not be allowed to contain any of the following:

(1,1)
(2,1)
(3,1)

I'm happy to treat (1,1) separately (e.g. using a CHECK CONSTRAINT) if it makes the solution more straightforward.

I was thinking of creating an AFTER INSERT trigger with a recursive CTE (though there may be an easier way to do it).

Assuming this is the way to go, what would the trigger definition be? If there is a more elegant way, what is it?

2

There are 2 best solutions below

7
TT. On BEST ANSWER

Note first that it is preferable to detect cycles in another environment as recursive CTEs aren't known for their good performance and neither is a trigger that would run for each insert statement. For large graphs, a solution based on the solution below will likely be inefficient.


Suppose you create the table as follows:

CREATE TABLE dbo.lnk (
    node_from INT NOT NULL,
    node_to INT NOT NULL,
    CONSTRAINT CHK_self_link CHECK (node_from<>node_to),
    CONSTRAINT PK_lnk_node_from_node_to PRIMARY KEY(node_from,node_to)
);

That would block inserts with node_from equal to node_to, and for rows that already exist.

The following trigger should detect cyclic references by throwing an exception if a cyclic reference is detected:

CREATE TRIGGER TRG_no_circulars_on_lnk ON dbo.lnk AFTER INSERT
AS
BEGIN
    DECLARE @cd INT;
    WITH det_path AS (
        SELECT
            anchor=i.node_from,
            node_to=l.node_to,
            is_cycle=CASE WHEN i.node_from/*anchor*/=l.node_to THEN 1 ELSE 0 END
        FROM
            inserted AS i
            INNER JOIN dbo.lnk AS l ON
                l.node_from=i.node_to
        UNION ALL
        SELECT
            dp.anchor,
            node_to=l.node_to,
            is_cycle=CASE WHEN dp.anchor=l.node_to THEN 1 ELSE 0 END
        FROM
            det_path AS dp
            INNER JOIN dbo.lnk AS l ON
                l.node_from=dp.node_to
        WHERE
            dp.is_cycle=0
    )
    SELECT TOP 1
        @cd=is_cycle 
    FROM 
        det_path
    WHERE
        is_cycle=1
    OPTION 
        (MAXRECURSION 0);

    IF @cd IS NOT NULL 
        THROW 67890, 'Insert would cause cyclic reference', 1;
END

I tested this for a limited number of inserts.

INSERT INTO dbo.lnk(node_from,node_to)VALUES(1,2); -- OK
INSERT INTO dbo.lnk(node_from,node_to)VALUES(2,3); -- OK
INSERT INTO dbo.lnk(node_from,node_to)VALUES(3,4); -- OK

And

INSERT INTO dbo.lnk(node_from,node_to)VALUES(2,3); -- PK violation
INSERT INTO dbo.lnk(node_from,node_to)VALUES(1,1); -- Check constraint violation
INSERT INTO dbo.lnk(node_from,node_to)VALUES(3,2); -- Exception: Insert would cause cyclic reference
INSERT INTO dbo.lnk(node_from,node_to)VALUES(3,1); -- Exception: Insert would cause cyclic reference
INSERT INTO dbo.lnk(node_from,node_to)VALUES(4,1); -- Exception: Insert would cause cyclic reference

It also detects cyclic references already present in the inserted rows if inserting more than one row at once, or if a path longer than one edge would be introduced in the graph. Going off on the same initial inserts:

INSERT INTO dbo.lnk(node_from,node_to)VALUES(8,9),(9,8);       -- Exception: Insert would cause cyclic reference
INSERT INTO dbo.lnk(node_from,node_to)VALUES(4,5),(5,6),(6,1); -- Exception: Insert would cause cyclic reference
11
MtwStark On

EDIT: handle multi-record inserts, moved logic in separate function

I have considered a procedural approach, it is very fast and almost independent from number of records in link table and graph "density"

I have tested it on a table with 10'000 links with nodes values from 1 to 1000.
It is really really fast an do not suffer of link table dimension or "density"

In addition, the function could be used to test values before insert or (for example) if you don't want to use a trigger at all an move test logic to the client.

Consideration on Recursive CTE: BE CAREFUL!
I have tested the accepted answer on my test table (10k rows) but after 25 minutes, I have cancelled the insert operation of one single row because the query was hung with no result... Downsizing the table to 5k rows the insert of a single record can last up to 2-3 minutes. It is very dependant from the "population" of the graph. If you insert a new path, or you are adding a node to a path with low "ramification" it is quite fast but you have no control over that. When the graph will be more "dense" this solution will blow up in your face.

Consider your needs very carefully.

So, let's see how to..

First of all I have set the PK of table to both columns and added an index on second column for full coverage. (the CHECK on FromNodeId<>ToNodeId is not needed cause the algorithm already cover this case).

CREATE TABLE [dbo].[Link](
    [FromNodeId] [int] NOT NULL,
    [ToNodeId] [int] NOT NULL,
 CONSTRAINT [PK_Link] PRIMARY KEY CLUSTERED ([FromNodeId],[ToNodeId])
) 
GO

CREATE NONCLUSTERED INDEX [ToNodeId] ON [dbo].[Link] ([ToNodeId])
GO

Then I have built a function to test the validity of a single link:

drop function fn_test_link
go
create function fn_test_link(@f int, @t int)
returns int
as
begin
    --SET NOCOUNT ON

    declare @p table (id int identity primary key, l int, t int, unique (l,t,id))
    declare @r int = 0
    declare @i int = 0

    -- link is not self-referencing
    if @f<>@t begin 
        -- there are links that starts from where new link wants to end (possible cycle)
        if exists(select 1 from link where fromnodeid=@t) begin

            -- PAY ATTENTION.. HERE LINK TABLE ALREADY HAVE ALL RECORDS ADDED (ALSO NEW ONES IF PROCEDURE IS CALLED FROM A TRIGGER AFTER INSERT)

            -- LOAD ALL THE PATHS TOUCHED BY DESTINATION OF TEST NODE
            set @i = 0
            insert into @p 
            select distinct @i, ToNodeId 
            from link 
            where fromnodeid=@t

            set @i = 1
            -- THERE IS AT LEAST A STEP TO FOLLOW DOWN THE PATHS
            while exists(select 1 from @p where l=@i-1) begin

                -- LOAD THE NEXT STEP FOR ALL THE PATHS TOUCHED
                insert into @p 
                select distinct @i, l.ToNodeId
                from link l
                join @p p on p.l = @i-1 and p.t = l.fromnodeid

                -- CHECK IF THIS STEP HAVE REACHED THE TEST NODE START
                if exists(select 1 from @p where l=@i and t=@f) begin
                    -- WE ARE EATING OUR OWN TAIL! CIRCULAR REFERENCE FOUND
                    set @r = -1
                    break
                end

                -- THE NODE IS STILL GOOD
                -- DELETE FROM LIST DUPLICATED ALREADY TESTED PATHS
                -- (THIS IS A BIG OPTIMIZATION, WHEN PATHS CROSSES EACH OTHER YOU RISK TO TEST MANY TIMES SAME PATHS)
                delete p 
                from @p p 
                where l = @i 
                and (exists(select 1 from @p px where px.l < p.l and px.t = p.t)) 

                set @i = @i + 1
            end
            if @r<0
                -- a circular reference was found 
                set @r = 0
            else
                -- no circular reference was found
                set @r = 1

        end else begin 
            -- THERE ARE NO LINKS THAT STARTS FROM TESTED NODE DESTINATIO (CIRCULAR REFERENCE NOT POSSIBLE)
            set @r = 1
        end
    end; -- link is not self-referencing

    --select * from @p 

    return @r

end
GO

Now let's call it from a trigger.
If more than a row will be inserted the trigger will test each link against the whole insert (old table + new recs), if all are valid and the final table will be consistent the insert will complete, if one of them is not valid the insert will abort.

DROP TRIGGER tr_test_circular_reference 
GO
CREATE TRIGGER tr_test_circular_reference ON link AFTER INSERT
AS
BEGIN
    SET NOCOUNT ON

    declare @p table (id int identity primary key, l int, f int, t int)
    declare @f int = 0
    declare @t int = 0

    declare @n int = 0
    declare @i int = 1

    declare @ins table (id int identity primary key, f int, t int)

    insert into @ins select * from inserted     

    set @n = @@ROWCOUNT;

    -- there are links to insert
    while @i<=@n begin

        -- load link
        select @f=f, @t=t from @ins where id = @i

        if dbo.fn_test_link(@f, @t)=0 begin
            declare @m nvarchar(255)                    
            set @m = formatmessage('Insertion of link (%d,%d) would cause circular reference (n.%d)', @f, @t, @i);
            THROW 50000, @m, 1
        end

        set @i = @i + 1
    end

END
GO

I hope this will help