I have two table schemas (MySQL 5.6 so no CTE), roughly looking like this:
CREATE TABLE nodes (
node_id INT PRIMARY KEY,
name VARCHAR(10)
);
CREATE TABLE edges (
edge_id INT PRIMARY KEY,
source INT,
target INT,
FOREIGN KEY (source) REFERENCES nodes(node_id),
FOREIGN KEY (target) REFERENCES nodes(node_id)
);
In our design, a logical edge between two nodes (logically n1 -> n2) is actually represented as (n1 -> proxy node -> n2) in the db. The reason we use two edges and a proxy node for a logical edge is so that we can store properties on the edge. Therefore, when a client queries for two nodes connected by an edge, the query is translated to query three connected nodes instead.
I have written a query to get a path with a fixed length. For example, "give me all the paths that start with a node with some properties, and end with a node with some properties, with exactly 5 edges on the path." This is done without using recursion on the SQL side; I just generate a long query programmatically with the specified fixed length.
The challenge is, we want to support querying of a variable-length path. For example, "give me all the paths that start with a node with some properties, and end with a node with some properties, with no fewer than 3 edges and no more than 10 edges on the path." Is this feasible without (or even with) CTE?
EDIT:
Some sample data:
-- Logical nodes are 1, 3, 5, 7, 9, 11. The rest are proxy nodes.
INSERT INTO nodes VALUES
(1, 'foo'),
(2, '_proxy_'),
(3, 'foo'),
(4, '_proxy_'),
(5, 'bar'),
(6, '_proxy_'),
(7, 'bar'),
(8, '_proxy_'),
(9, 'bar'),
(10, '_proxy_'),
(11, 'bar');
-- Connects 1 -> 2 -> ... -> 11.
INSERT INTO edges VALUES
(1, 1, 2),
(2, 2, 3),
(3, 3, 4),
(4, 4, 5),
(5, 5, 6),
(6, 6, 7),
(7, 7, 8),
(8, 8, 9),
(9, 9, 10),
(10, 10, 11);
The query can be, "select the ID and names of all the nodes on a path such that the path starts with a node named 'foo' and ends with a node named 'bar', with at least 2 nodes and at most 4 nodes on the path." Such paths include 1 -> 3 -> 5, 1 -> 3 -> 5 -> 7, 3 -> 5, 3 -> 5 -> 7, and 3 -> 5 -> 7 -> 9. So the result set should include the IDs and names of nodes 1, 3, 5, 7, 9.
The following query returns all paths of interest in comma separated strings.
The result looks like this:
View on DB Fiddle
You can now parse the strings in application code and merge/union the arrays. If you only want the contained node ids, you can also change the outer query to:
Result:
View on DB Fiddle
Note that a JOIN on
FIND_IN_SET()might be slow, ifrctecontains too many rows. I would rather do this step in application code, which should be quite simple in a procedural language.MySQL 5.6 solution
Prior to MySQL 8.0 and MariaDB 10.2 there was no way for recursions. Farther there are many other limitations, which make a workaround difficult. For example:
memmoryengineHowever - an RCTE can be emulated in a stored procedure moving rows between two (temporary) tables. The following procedure does that:
Use it as:
Result:
View on DB Fiddle
This is far from being optimal. If the result has many or long paths, you might need to define some indexes on the temprary tables. Also I don't like the idea of creating (temporary) tables in stroed procedures. See it as "proof of concept". Use it on your own risk.