I am trying to get the distinct count of clusters that contains a certain value from a list of values.
Dataset:
CREATE
(a1: node {tag: "a"}),
(a2: node {tag: "a"}),
(a3: node {tag: "a"}),
(a4: node {tag: "a"}),
(b1: node {tag: "b"}),
(b2: node {tag: "b"}),
(c1: node {tag: "c"}),
(a1)-[:LINKS_TO]->(a2),
(a4)-[:LINKS_TO]->(b1),
(a4)-[:LINKS_TO]->(c1)
I would like to get a distinct count of clusters for each distinct value of tag.
- a: 3.
- There are 4 nodes that has
tag: a, but they are in 3 distinct clusters, cluster 1,2,3
- There are 4 nodes that has
- b: 2.
- b appears in 2 distinct clusters, cluster 3, 4
- c: 1.
- c appears in 1 distinct clusters, cluster 3
I attempted to get a distinct list of tag value and list of clusters through below query, but I am not sure how I should proceed to join/link the 2 lists to get the expected distinct count.
MATCH (node)
WITH collect(distinct node.tag) AS tag_list, collect(node) AS clusters
RETURN tag_list, clusters
Many thanks in advance!

Here is a way that doesn't depend on a cluster attribute:
Result:
This works because if a cluster has at least one node with a given tag, it will only return the one with the lowest id.