I have the following table in SQL:
CREATE TABLE friendships
(
id INT PRIMARY KEY,
person VARCHAR(50),
friend VARCHAR(50)
);
INSERT INTO friendships (id, person, friend) VALUES
(1, 'person3', 'person9'),
(3, 'person10', 'person4'),
(4, 'person2', 'person1'),
(5, 'person6', 'person7'),
(7, 'person4', 'person10'),
(8, 'person6', 'person7'),
(10, 'person10', 'person9'),
(11, 'person5', 'person10'),
(12, 'person3', 'person7'),
(13, 'person9', 'person5'),
(14, 'person9', 'person7'),
(15, 'person9', 'person5'),
(16, 'person3', 'person6'),
(17, 'person8', 'person9'),
(18, 'person10', 'person2'),
(19, 'person7', 'person5'),
(20, 'person10', 'person8');
Using R, I can plot the table the data like this:
library(igraph)
friendships = structure(list(person = c("person3", "person10", "person2", "person6",
"person4", "person6", "person10", "person5", "person3", "person9",
"person9", "person9", "person3", "person8", "person10", "person7",
"person10"), friend = c("person9", "person4", "person1", "person7",
"person10", "person7", "person9", "person10", "person7", "person5",
"person7", "person5", "person6", "person9", "person2", "person5",
"person8")), row.names = c(1L, 3L, 4L, 5L, 7L, 8L, 10L, 11L,
12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L), class = "data.frame")
g <- graph_from_data_frame(df, directed = FALSE)
plot(g, vertex.size=10, vertex.label.cex=0.8)
ONLY using SQL commands, I want to answer the following questions (in the future, generalize to the n-th degree):
- How many nodes are connected to Person10 by degree2 (i.e. friends of friends,, 8 people)
- Who are these people that are connected to person10 by degree 2? (i.e. person2, person1, person4, person8, person9, person5, person7, person3)
I am not sure how to do these in SQL - I normally use igraph for these problems.
Here is my attempt to answer the first one:
SELECT COUNT(DISTINCT f2.friend)
FROM friendships f1
JOIN friendships f2 ON f1.friend = f2.person
WHERE f1.person = 'person10' AND f2.friend != 'person10' AND f2.friend NOT IN (
SELECT friend FROM friendships WHERE person = 'person10'
);
#result
3
Here is my attempt to answer the second one:
SELECT DISTINCT f2.friend
FROM friendships f1
JOIN friendships f2 ON f1.friend = f2.person
WHERE f1.person = 'person10' AND f2.friend != 'person10' AND f2.friend NOT IN (
SELECT friend FROM friendships WHERE person = 'person10'
);
#result
1 person5
2 person7
3 person1
However these are not the correct results.
Can someone please show me how to do this correctly? Perhaps a loop/function can be written in R that executes the SQL until termination?
I am using an older version of SQL that doesn't have recursive functions.
Thanks!

Consider this recursive example.
There condition
where person='person10'for select one person, otherwise - find level2 friends for all persons.Condition
lvl<2- count friends for level 1 and level 2. You can set any possible level.And
pathcolumn only for clarity, if graph not have cycles.Result for all person
For person='person10' cte
rbypasses the graph along all paths.AllFriendsforperson10(level up 2) and count of path for friendThere 2 path's (10-4),(4-10) counted as 1 path - in first cte we use
union(notunion all).For pair (10,9) there 4 path's (10-9),(10-8-9),(10-5-9) and (10-5-9), on other levels we not exclude paths (
union all). Possible left only unique path's by grouping.Output for lvl=1
Demo here
Example for SQL server. Virtually unchanged (varchar(1000)->nchar(1000)) suitable for MySQL.
Unfortunately, I don't know how to do extended recursion in postgresql like (select .. union all select ... union all select ...)
We can do this in postgresql by making it more difficult join anchor and the table.
Upd1. If graph have cycles, we mast check - we have already passed this node - is there a node in the
path? Such a check depends somewhat on the DBMS used (instr,patindex ...). In this case, the path is mandatory.