Why is there duplicate value being generated with multiple matches in neo4j?

Question

Why is there duplicate value being generated with multiple matches in neo4j?

22 Views Asked by Novabro At 28 February 2024 at 17:02

Let there be 6 nodes total, with two labels (time1 and time2), with 3 nodes in each label.

When I run: match (a:time1) return collect(a) I get a list of 3 nodes with labels of time1. Which is expected.

However, when I try to run: match (a:time1), (b:time2) return collect(a), collect(b) I expect to get 2 lists with 3 elements in each. Instead, I get 2 lists with 9 elements in each, with 3 duplicate elements of each element. The following is the csv file of the output:

collect(a),collect(b)

"[(:time1 {con1: 0.9,con2: 0.0,con3: 0.0}), (:time1 {con1: 0.9,con2: 0.0,con3: 0.0}), (:time1 {con1: 0.9,con2: 0.0,con3: 0.0}), (:time1 {con1: 0.0,con2: 0.4,con3: 0.0}), (:time1 {con1: 0.0,con2: 0.4,con3: 0.0}), (:time1 {con1: 0.0,con2: 0.4,con3: 0.0}), (:time1 {con1: 0.0,con2: 0.0,con3: 0.5}), (:time1 {con1: 0.0,con2: 0.0,con3: 0.5}), (:time1 {con1: 0.0,con2: 0.0,con3: 0.5})]","[(:time2 {con1: 0.9,con2: 0.0,con3: 0.0}), (:time2 {con1: 0.0,con2: 0.4,con3: 0.0}), (:time2 {con1: 0.0,con2: 0.0,con3: 0.5}), (:time2 {con1: 0.9,con2: 0.0,con3: 0.0}), (:time2 {con1: 0.0,con2: 0.4,con3: 0.0}), (:time2 {con1: 0.0,con2: 0.0,con3: 0.5}), (:time2 {con1: 0.9,con2: 0.0,con3: 0.0}), (:time2 {con1: 0.0,con2: 0.4,con3: 0.0}), (:time2 {con1: 0.0,con2: 0.0,con3: 0.5})]"

I don't know where to start to fix the issue. I think my understanding of multiple matches work and/or the return clause. Any insight is greatly appreciated.

Original Q&A

There are 1 best solutions below

**Christophe Willemsen** · Answer 1 · 2024-02-28T20:25:59.040000

Just like in SQL, query parts not connected to each other will produce a cross/cartesian product.

Take the following simple 2 persons 2 movies graph:

CREATE (:Person {name: "Neo"}), (:Person {name: "Trinity"})
CREATE (:Movie {title: "Matrix"}), (:Movie {title: "The Matrix, Reloaded"})

If you query all persons and movies, it will generate combinations for every of them:

MATCH (p:Person), (m:Movie)
RETURN p.name, m.title

╒═════════╤══════════════════════╕
│p.name   │m.title               │
╞═════════╪══════════════════════╡
│"Neo"    │"Matrix"              │
├─────────┼──────────────────────┤
│"Neo"    │"The Matrix, Reloaded"│
├─────────┼──────────────────────┤
│"Trinity"│"Matrix"              │
├─────────┼──────────────────────┤
│"Trinity"│"The Matrix, Reloaded"│
└─────────┴──────────────────────┘

And you can imagine the size of the cartesian product with graphs of bigger size.

You generally would break down your queries with a WITH clause:

MATCH (p:Person)
WITH collect(p.name) AS persons
MATCH (m:Movie)
RETURN persons, collect(m.title) AS movies

╒══════════════════╤══════════════════════════════════╕
│persons           │movies                            │
╞══════════════════╪══════════════════════════════════╡
│["Neo", "Trinity"]│["Matrix", "The Matrix, Reloaded"]│
└──────────────────┴──────────────────────────────────┘

Or use more modern Cypher like COLLECT subqueries instead:

RETURN 
    COLLECT {
        MATCH (p:Person)
        RETURN p.name
    } AS persons,
    COLLECT {
        MATCH (m:Movie)
        RETURN m.title
    } AS movies

╒══════════════════╤══════════════════════════════════╕
│persons           │movies                            │
╞══════════════════╪══════════════════════════════════╡
│["Neo", "Trinity"]│["Matrix", "The Matrix, Reloaded"]│
└──────────────────┴──────────────────────────────────┘

Why is there duplicate value being generated with multiple matches in neo4j?

There are 1 best solutions below

Related Questions in NEO4J

Related Questions in CYPHER

Trending Questions

Popular # Hahtags

Popular Questions