Why is there duplicate value being generated with multiple matches in neo4j?

22 Views Asked by At

Let there be 6 nodes total, with two labels (time1 and time2), with 3 nodes in each label.

When I run: match (a:time1) return collect(a) I get a list of 3 nodes with labels of time1. Which is expected.

However, when I try to run: match (a:time1), (b:time2) return collect(a), collect(b) I expect to get 2 lists with 3 elements in each. Instead, I get 2 lists with 9 elements in each, with 3 duplicate elements of each element. The following is the csv file of the output:

collect(a),collect(b)

"[(:time1 {con1: 0.9,con2: 0.0,con3: 0.0}), (:time1 {con1: 0.9,con2: 0.0,con3: 0.0}), (:time1 {con1: 0.9,con2: 0.0,con3: 0.0}), (:time1 {con1: 0.0,con2: 0.4,con3: 0.0}), (:time1 {con1: 0.0,con2: 0.4,con3: 0.0}), (:time1 {con1: 0.0,con2: 0.4,con3: 0.0}), (:time1 {con1: 0.0,con2: 0.0,con3: 0.5}), (:time1 {con1: 0.0,con2: 0.0,con3: 0.5}), (:time1 {con1: 0.0,con2: 0.0,con3: 0.5})]","[(:time2 {con1: 0.9,con2: 0.0,con3: 0.0}), (:time2 {con1: 0.0,con2: 0.4,con3: 0.0}), (:time2 {con1: 0.0,con2: 0.0,con3: 0.5}), (:time2 {con1: 0.9,con2: 0.0,con3: 0.0}), (:time2 {con1: 0.0,con2: 0.4,con3: 0.0}), (:time2 {con1: 0.0,con2: 0.0,con3: 0.5}), (:time2 {con1: 0.9,con2: 0.0,con3: 0.0}), (:time2 {con1: 0.0,con2: 0.4,con3: 0.0}), (:time2 {con1: 0.0,con2: 0.0,con3: 0.5})]"

I don't know where to start to fix the issue. I think my understanding of multiple matches work and/or the return clause. Any insight is greatly appreciated.

1

There are 1 best solutions below

0
Christophe Willemsen On

Just like in SQL, query parts not connected to each other will produce a cross/cartesian product.

Take the following simple 2 persons 2 movies graph:

CREATE (:Person {name: "Neo"}), (:Person {name: "Trinity"})
CREATE (:Movie {title: "Matrix"}), (:Movie {title: "The Matrix, Reloaded"})

If you query all persons and movies, it will generate combinations for every of them:

MATCH (p:Person), (m:Movie)
RETURN p.name, m.title

╒═════════╤══════════════════════╕
│p.name   │m.title               │
╞═════════╪══════════════════════╡
│"Neo"    │"Matrix"              │
├─────────┼──────────────────────┤
│"Neo"    │"The Matrix, Reloaded"│
├─────────┼──────────────────────┤
│"Trinity"│"Matrix"              │
├─────────┼──────────────────────┤
│"Trinity"│"The Matrix, Reloaded"│
└─────────┴──────────────────────┘

And you can imagine the size of the cartesian product with graphs of bigger size.

You generally would break down your queries with a WITH clause:

MATCH (p:Person)
WITH collect(p.name) AS persons
MATCH (m:Movie)
RETURN persons, collect(m.title) AS movies

╒══════════════════╤══════════════════════════════════╕
│persons           │movies                            │
╞══════════════════╪══════════════════════════════════╡
│["Neo", "Trinity"]│["Matrix", "The Matrix, Reloaded"]│
└──────────────────┴──────────────────────────────────┘

Or use more modern Cypher like COLLECT subqueries instead:

RETURN 
    COLLECT {
        MATCH (p:Person)
        RETURN p.name
    } AS persons,
    COLLECT {
        MATCH (m:Movie)
        RETURN m.title
    } AS movies

╒══════════════════╤══════════════════════════════════╕
│persons           │movies                            │
╞══════════════════╪══════════════════════════════════╡
│["Neo", "Trinity"]│["Matrix", "The Matrix, Reloaded"]│
└──────────────────┴──────────────────────────────────┘