Neo4j: How to pass a variable to Neo4j Apoc (apoc.path.subgraphAll) Property

1.7k Views Asked by At

Am new to Neo4j and trying to do a POC by implementing a graph DB for Enterprise Reference / Integration Architecture (Architecture showing all enterprise applications as Nodes, Underlying Tables / APIs - logically grouped as Nodes, integrations between Apps as Relationships.

Objective is to achieve seamlessly 'Impact Analysis' using the strength of Graph DB (Note: I understand this may be an incorrect approach to achieve whatever am trying to achieve, so suggestions are welcome)

Let me come brief my question now,

There are four Apps - A1, A2, A3, A4; A1 has set of Tables (represented by a node A1TS1) that's updated by Integration 1 (relationship in this case) and the same set of tables are read by Integration 2. So the Data model looks like below

(A1TS1)<-[:INT1]-(A1)<-[:INT1]-(A2)
(A1TS1)-[:INT2]->(A1)-[:INT2]->(A4)

I have the underlying application table names captured as a List property in A1TS1 node.

Let's say one of the app table is altered for a new column or Data type and I wanted to understand all impacted Integrations and Applications. Now am trying to write a query as below to retrieve all nodes & relationships that are associated/impacted because of this table alteration but am not able to achieve this

Expected Result is - all impacted nodes (A1TS1, A1, A2, A4) and relationships (INT1, INT2)

Option 1 (Using APOC)

MATCH (a {TCName:'A1TS1',AppName:'A1'})-[r]-(b) 
WITH a as STRTND, Collect(type(r)) as allr 
CALL apoc.path.subgraphAll(STRTND, {relationshipFilter:allr}) YIELD nodes, relationships 
RETURN nodes, relationships

This faile with error Failed to invoke procedure 'apoc.path.subgraphAll': Caused by: java.lang.ClassCastException: java.util.ArrayList cannot be cast to java.lang.String

Option 2 (Using with, unwind, collect clause)

MATCH (a {TCName:'A1TS1',AppName:'A1'})-[r]-(b) 
WITH a as STRTND, Collect(r) as allr 
UNWIND allr as rels 
MATCH p=()-[rels]-()-[rels]-() 
RETURN p

This fails with error "Cannot use the same relationship variable 'rels' for multiple patterns" but if I use the [rels] once like p=()-[rels]=() it works but not yielding me all nodes

Any help/suggestion/lead is appreciated. Thanks in advance

Update Trying to give more context Showing the Underlying Data

MATCH (TC:TBLCON) RETURN TC

"TC"
{"Tables":["TBL1","TBL2","TBL3"],"TCName":"A1TS1","AppName":"A1"}  
{"Tables":["TBL4","TBL1"],"TCName":"A2TS1","AppName":"A2"} 

MATCH (A:App) RETURN A

"A"                                                                   
{"Sponsor":"XY","Platform":"Oracle","TechOwnr":"VV","Version":"12","Tags":["ERP","OracleEBS","FinanceSystem"],"AppName":"A1"}               
{"Sponsor":"CC","Platform":"Teradata","TechOwnr":"RZ","Tags":["EDW","DataWarehouse"],"AppName":"A2"}   

MATCH ()-[r]-() RETURN distinct r.relname

"r.relname"
"FINREP"   │  (runs between A1 to other apps)
"UPFRNT"   │  (runs between A2 to different Salesforce App)
"INVOICE"  │  (runs between A1 to other apps)

With this, here is what am trying to achieve Assume "TBL3" is getting altered in App A1, I wanted to write a query specifying the table "TBL3" in match pattern, get all associated relationships and connected nodes (upstream)

May be I need to achieve in 3 steps, Step 1 - Write a match pattern to find the start node and associated relationship(s) Step 2 - Store that relationship(s) from step 1 in a Array variable / parameter Step 3 - Pass the start node from step 1 & parameter from step 2 to apoc.path.subgraphAll to see all the impacted nodes

This may conceptually sound valid but how to do that technically in neo4j Cypher query is the question.

Hope this helps

2

There are 2 best solutions below

0
Mani Eswaran On BEST ANSWER

WIth some lead from cybersam's response, the below query gets me what I want. Only constraint is, this result is limited to 3 layers (3rd layer through Optional Match)

MATCH (TC:TBLCON) WHERE 'TBL3' IN TC.Tables 
CALL apoc.path.subgraphAll(TC, {maxLevel:1}) YIELD nodes AS invN, relationships AS invR
WITH TC, REDUCE (tmpL=[], tmpr IN invR | tmpL+type(tmpr)) AS impR
MATCH FLP=(TC)-[]-()-[FLR]-(SL) WHERE type(FLR) IN impR 
WITH FLP, TC, SL,impR 
OPTIONAL MATCH SLP=(SL)-[SLR]-() WHERE type(SLR) IN impR RETURN FLP,SLP

This works for my needs, hope this might also help someone.

Thanks everyone for the responses and suggestions

****Update****

Enhanced the query to get rid of Optional Match criteria and other given limitations

MATCH (initTC:TBLCON) WHERE $TL IN initTC.Tables 
WITH Reduce(O="",OO in Reduce (I=[], II in collect(apoc.node.relationship.types(initTC)) | I+II) | O+OO+"|") as RF
MATCH (TC:TBLCON) WHERE $TL IN TC.Tables 
CALL apoc.path.subgraphAll(TC,{relationshipFilter:RF}) YIELD nodes, relationships
RETURN nodes, relationships

Thanks all (especially cybersam)

4
cybersam On

This query may do what you want:

MATCH (tc:TBLCON)
WHERE $table IN tc.Tables
MATCH p=(tc)-[:Foo*]-()
WITH tc,
  REDUCE(s = [], x IN COLLECT(NODES(p)) | s + x) AS ns,
  REDUCE(t = [], y IN COLLECT(RELATIONSHIPS(p)) | t + y) AS rs
UNWIND ns AS n
WITH tc, rs, COLLECT(DISTINCT n) AS nodes
UNWIND rs AS rel
RETURN tc, nodes, COLLECT(DISTINCT rel) AS rels;

It assumes that you provide the name of the table of interest (e.g., "TBL3") as the value of a table parameter. It also assumes that the relationships of interest all have the Foo type.

It first finds tc, the TBLCON node(s) containing that table name. It then uses a variable-length non-directional search for all paths (with non-repeating relationships) that include tc. It then uses COLLECT twice: to aggregate the list of nodes in each path, and to aggregate the list of relationships in each path. Each aggregation result would be a list of lists, so it uses REDUCE on each outer list to merge the inner lists. It then uses UNWIND and COLLECT(DISTINCT x) on each list to produce a list with unique elements.

[UPDATE]

If you differentiate between your relationships by type (rather than by property value), your Cypher code can be a lot simpler by taking advantage of APOC functions. The following query assumes that the desired relationship types are passed via a types parameter:

MATCH (tc:TBLCON)
WHERE $table IN tc.Tables
CALL apoc.path.subgraphAll(
  tc, {relationshipFilter: apoc.text.join($types, '|')}) YIELD nodes, relationships
RETURN nodes, relationships;