Two collections exist;
- The
nodescollection - No values here - The
edgesedge collection - single boolean value -v
I have a persistent index on edges with the following fields: _to, v
Starting from nodes/a, I would like to retrieve all paths that have at least one edge with edge.v == true
Currently, I achieve it with the following query:
for v, e, p in 1..5 inbound "nodes/a" edges
FILTER p.edges[*].v ANY == true
return p
The query works, however, Arango is refusing to use the persistent vertex-centric index I have set.
Profile:
Query String (94 chars, cacheable: false):
for v, e, p in 1..5 inbound "nodes/a" edges
FILTER p.edges[*].v ANY == true
return p
Execution plan:
Id NodeType Calls Items Filtered Runtime [s] Comment
1 SingletonNode 1 1 0 0.00001 * ROOT
2 TraversalNode 1 9 0 0.00162 - FOR v /* vertex */, p /* paths: vertices, edges */ IN 1..5 /* min..maxPathDepth */ INBOUND 'nodes/a' /* startnode */ edges
3 CalculationNode 1 9 0 0.00014 - LET #7 = (p.`edges`[*].`v` any == true) /* simple expression */
4 FilterNode 1 6 3 0.00009 - FILTER #7
5 ReturnNode 1 6 0 0.00002 - RETURN p
Indexes used:
By Name Type Collection Unique Sparse Cache Selectivity Fields Stored values Ranges
2 edge edge edges false false false 77.78 % [ `_to` ] [ ] base INBOUND
Traversals on graphs:
Id Depth Vertex collections Edge collections Options Filter / Prune Conditions
2 1..5 edges uniqueVertices: none, uniqueEdges: path
Optimization rules applied:
Id RuleName
1 optimize-traversals
Query Statistics:
Writes Exec Writes Ign Scan Full Scan Index Cache Hits/Misses Filtered Peak Mem [b] Exec Time [s]
0 0 0 45 10 / 0 3 32768 0.00378
Query Profile:
Query Stage Duration [s]
initializing 0.00000
parsing 0.00011
optimizing ast 0.00000
loading collections 0.00001
instantiating plan 0.00019
optimizing plan 0.00101
instantiating executors 0.00052
executing 0.00194
finalizing 0.00005
It is worth noting that if I change ANY to ALL, the index is used.
In a real life scenario, nodes/a is a supernode with potentially hundreds of thousands of edges at the first hop, and this query takes a long time to execute.
How can I optimize this query to use an index?
