I'm working with ArangoDB and have a graph traversal scenario where I need to skip a specific node based on a property, but still infer an indirect connection (edge) between two other nodes. My graph contains edges from A to B and B to C, but not directly from A to C. Node B has a property `ShouldSkip` set to true, and I want to skip it in the traversal results.
The desired outcome is to get edges: edge from A to C (inferred), and nodes: A and C, effectively skipping B in the results. However, since there's no direct edge from A to C in the graph, I'm not sure how to represent this in the query results.
Here's my current AQL query:
LET startNodeId = 'A' // Example start node
LET depth = 2
LET startNode = DOCUMENT('nodeCollectionName', startNodeId)
LET traversalResults = (
FOR v, e IN 1..depth OUTBOUND startNode GRAPH 'graphName'
FILTER v.ShouldSkip != true
LIMIT 100
RETURN {node: v, edge: e}
)
LET allNodes = (
FOR tr IN traversalResults
RETURN tr.node
)
LET allEdges = (
FOR tr IN traversalResults
RETURN tr.edge
)
RETURN {StartNode: startNode, Nodes: UNIQUE(FLATTEN(allNodes)), Edges: UNIQUE(FLATTEN(allEdges))}
How can I adjust this query to infer an edge from A to C (only! without A to B and B to C), or is there a better approach to achieve this in ArangoDB (like while in indexing create a virtual edge of A to C - much less preferable)?
Effectively I would like the response to be: nodes: [A,C] edges: [{_from: A, _to: C}]

If you have a graph like this:
And you want to infer (skipping node B):
You would need to modify the AQL query to perform a conditional traversal.
The key is to collect the paths and then manually construct the edges you aim to include based on the nodes that do not have the
ShouldSkipproperty set to true.pathscollects all the paths from the starting node up to the specified depth, making sure that none of the nodes in the path should be skipped.Then,
inferredEdgesconstructs the edges from the first node in the path to the last node, provided that neither has theShouldSkipproperty set to true.Even if there is no direct edge from A to C in the graph, you can infer one as long as there is a path that does not include any nodes that should be skipped.
That should give you the structure you want, with node B being excluded from the output.
The purpose of the filter
FILTER p.vertices[*].ShouldSkip ALL != truein thepathssection is to make sure any path that is returned does not include any vertices (nodes) that have theShouldSkipproperty set totrue.However, you might actually want to skip only node B but still want to infer a connection between nodes A and C.
In that case, the filter condition in the
pathssection should be adjusted to allow paths through node B but still exclude node B from the final results.Here is the adjustment needed for the query:
The
FILTERstatement inside thepathssubquery now allows the path to include the startNode (A in your case) even if it should be skipped according to theShouldSkipproperty, but it makes sure the second node in the path (B) is not skipped.That way, the path from A to C through B is allowed, but node B is still not included in the final results. The paths that include a
ShouldSkipnode anywhere else are still excluded.The
COLLECTstatement is used to group the paths by their start and end nodes, and then for each group, it filters out any paths that do not meet the criteria of excluding theShouldSkipnodes.Finally, the
LET inferredEdgespart creates the edges that you want to include in your results based on the filtered paths. TheLET validPathsinner loop filters out paths that directly involve aShouldSkipnode as the second node.Again, this assumes that you are only interested in paths where the immediate node after the start node does not have the
ShouldSkipproperty set.With these adjustments, the query should give you an inferred edge from A to C without including any edges to or from B, as long as B is the only node that should be skipped on the path from A to C. The result set will then exclude node B, while still allowing the traversal to indirectly connect A and C.
By contrast, Shahar Shokrani's answer is simpler: It avoids complex filtering and grouping logic.
By using the
UNIQUEfunction to deduplicate nodes and edges, the solution makes sure the result set is minimized and does not contain redundant information.However:
The OP's solution uses the
ANYkeyword in the traversal, which does not specify the direction of the edges. My solution explicitly usesOUTBOUND, which respects the direction from A to C. If the graph has edges in both directions or the direction, that can be important.My query has a provision to include the start node in the results even if it has the
ShouldSkipproperty set totrue. The OP's query does not account for this scenario and would exclude the start node if it hadShouldSkipset totrue.My query attempts to handle the depth of the traversal more explicitly by allowing for intermediate nodes to be skipped. The OP's solution infers edges only between consecutive nodes after filtering, which may not correctly infer all the necessary edges if there are multiple
ShouldSkipnodes in sequence.My solution includes logic to specifically filter out paths that have a
ShouldSkipnode as the second node in the path, which suggests a consideration for the position of the node within the path. The OP's solution does not distinguish between the positions of the nodes within the path; it simply filters out all nodes withShouldSkipset totrue.My query makes sure only paths without any
ShouldSkipnodes (except potentially the start node) are considered for edge inference. That implies an understanding of continuity in the path from A to C, which is critical if the indirect connection between A and C should only be inferred if there is a continuous path without skipped nodes. The OP's approach does not enforce this continuity, which could lead to inferred edges that do not represent a valid traversal within the original graph structure.