ArangoDB Graph Traversal: Excluding a Node Based on Property While Inferring Indirect Connections

107 Views Asked by At

I'm working with ArangoDB and have a graph traversal scenario where I need to skip a specific node based on a property, but still infer an indirect connection (edge) between two other nodes. My graph contains edges from A to B and B to C, but not directly from A to C. Node B has a property `ShouldSkip` set to true, and I want to skip it in the traversal results.

The desired outcome is to get edges: edge from A to C (inferred), and nodes: A and C, effectively skipping B in the results. However, since there's no direct edge from A to C in the graph, I'm not sure how to represent this in the query results.

Here's my current AQL query:

LET startNodeId = 'A' // Example start node
LET depth = 2
LET startNode = DOCUMENT('nodeCollectionName', startNodeId)

LET traversalResults = (
    FOR v, e IN 1..depth OUTBOUND startNode GRAPH 'graphName'
    FILTER v.ShouldSkip != true 
    LIMIT 100
    RETURN {node: v, edge: e}
)                

LET allNodes = (
    FOR tr IN traversalResults
    RETURN tr.node
)

LET allEdges = (
    FOR tr IN traversalResults
    RETURN tr.edge
)

RETURN {StartNode: startNode, Nodes: UNIQUE(FLATTEN(allNodes)), Edges: UNIQUE(FLATTEN(allEdges))}

How can I adjust this query to infer an edge from A to C (only! without A to B and B to C), or is there a better approach to achieve this in ArangoDB (like while in indexing create a virtual edge of A to C - much less preferable)?

enter image description here

Effectively I would like the response to be: nodes: [A,C] edges: [{_from: A, _to: C}]

2

There are 2 best solutions below

6
VonC On BEST ANSWER

If you have a graph like this:

A ──> B ──> C
     (X)

And you want to infer (skipping node B):

A ──> C

You would need to modify the AQL query to perform a conditional traversal.
The key is to collect the paths and then manually construct the edges you aim to include based on the nodes that do not have the ShouldSkip property set to true.

LET startNodeId = 'A' // Example start node
LET depth = 2
LET startNode = DOCUMENT('nodeCollectionName', startNodeId)

LET paths = (
    FOR v, e, p IN 1..depth OUTBOUND startNode GRAPH 'graphName'
    FILTER p.vertices[*].ShouldSkip ALL != true // Filter out paths containing nodes with ShouldSkip == true
    RETURN p
)

LET inferredEdges = (
    FOR path IN paths
    LET sourceNode = path.vertices[0] // The source node A
    LET targetNode = LAST(path.vertices) // The target node C
    FILTER sourceNode.ShouldSkip != true AND targetNode.ShouldSkip != true // Making sure neither A nor C should be skipped
    RETURN { _from: sourceNode._id, _to: targetNode._id }
)

RETURN {
    StartNode: startNode,
    Nodes: [startNode._key, (FOR edge IN inferredEdges RETURN DOCUMENT(edge._to)._key)], // Assuming _key is used as the node identifier
    Edges: inferredEdges
}

paths collects all the paths from the starting node up to the specified depth, making sure that none of the nodes in the path should be skipped.
Then, inferredEdges constructs the edges from the first node in the path to the last node, provided that neither has the ShouldSkip property set to true.

Even if there is no direct edge from A to C in the graph, you can infer one as long as there is a path that does not include any nodes that should be skipped.
That should give you the structure you want, with node B being excluded from the output.


Not sure why do we need the FILTER p.vertices[*].ShouldSkip ALL != true in the Paths section.

The purpose of the filter FILTER p.vertices[*].ShouldSkip ALL != true in the paths section is to make sure any path that is returned does not include any vertices (nodes) that have the ShouldSkip property set to true.


However, you might actually want to skip only node B but still want to infer a connection between nodes A and C.
In that case, the filter condition in the paths section should be adjusted to allow paths through node B but still exclude node B from the final results.

Here is the adjustment needed for the query:

LET startNodeId = 'A' // Example start node
LET depth = 2
LET startNode = DOCUMENT('nodeCollectionName', startNodeId)

LET paths = (
    FOR v, e, p IN 1..depth OUTBOUND startNode GRAPH 'graphName'
    FILTER v.ShouldSkip != true OR v._id == startNode._id // Allow the startNode even if it should be skipped
    COLLECT sourceNode = p.vertices[0], targetNode = LAST(p.vertices) INTO groupedPaths
    FILTER LENGTH(groupedPaths) > 0 // Making sure we have at least one valid path
    LET validPaths = (
        FOR group IN groupedPaths
        LET path = group.p
        FILTER path.vertices[1].ShouldSkip != true // Check the second node in the path
        RETURN path
    )
    FILTER LENGTH(validPaths) > 0 // We have a path that does not include skipped nodes except the start
    RETURN { _from: sourceNode._id, _to: targetNode._id }
)

LET inferredEdges = (
    FOR path IN paths
    RETURN { _from: path._from, _to: path._to }
)

LET finalNodes = (
    FOR edge IN inferredEdges
    RETURN DOCUMENT(edge._to)._key
)

RETURN {
    StartNode: startNode._key,
    Nodes: UNIQUE([startNode._key] + finalNodes),
    Edges: inferredEdges
}

The FILTER statement inside the paths subquery now allows the path to include the startNode (A in your case) even if it should be skipped according to the ShouldSkip property, but it makes sure the second node in the path (B) is not skipped.
That way, the path from A to C through B is allowed, but node B is still not included in the final results. The paths that include a ShouldSkip node anywhere else are still excluded.

The COLLECT statement is used to group the paths by their start and end nodes, and then for each group, it filters out any paths that do not meet the criteria of excluding the ShouldSkip nodes.
Finally, the LET inferredEdges part creates the edges that you want to include in your results based on the filtered paths. The LET validPaths inner loop filters out paths that directly involve a ShouldSkip node as the second node.

Again, this assumes that you are only interested in paths where the immediate node after the start node does not have the ShouldSkip property set.
With these adjustments, the query should give you an inferred edge from A to C without including any edges to or from B, as long as B is the only node that should be skipped on the path from A to C. The result set will then exclude node B, while still allowing the traversal to indirectly connect A and C.


By contrast, Shahar Shokrani's answer is simpler: It avoids complex filtering and grouping logic.
By using the UNIQUE function to deduplicate nodes and edges, the solution makes sure the result set is minimized and does not contain redundant information.

However:

  • The OP's solution uses the ANY keyword in the traversal, which does not specify the direction of the edges. My solution explicitly uses OUTBOUND, which respects the direction from A to C. If the graph has edges in both directions or the direction, that can be important.

  • My query has a provision to include the start node in the results even if it has the ShouldSkip property set to true. The OP's query does not account for this scenario and would exclude the start node if it had ShouldSkip set to true.

  • My query attempts to handle the depth of the traversal more explicitly by allowing for intermediate nodes to be skipped. The OP's solution infers edges only between consecutive nodes after filtering, which may not correctly infer all the necessary edges if there are multiple ShouldSkip nodes in sequence.

  • My solution includes logic to specifically filter out paths that have a ShouldSkip node as the second node in the path, which suggests a consideration for the position of the node within the path. The OP's solution does not distinguish between the positions of the nodes within the path; it simply filters out all nodes with ShouldSkip set to true.

  • My query makes sure only paths without any ShouldSkip nodes (except potentially the start node) are considered for edge inference. That implies an understanding of continuity in the path from A to C, which is critical if the indirect connection between A and C should only be inferred if there is a continuous path without skipped nodes. The OP's approach does not enforce this continuity, which could lead to inferred edges that do not represent a valid traversal within the original graph structure.

0
Shahar Shokrani On

Another solution,

  1. Find all paths.
  2. Prepare a filtered nodes array by the property
  3. Inferre a run-time edges according to the filtered nodes array.
  4. Return them all distinct.

Full code:

// Define the starting node ID and the depth for the graph traversal
LET startNodeId = @startNode
LET depth = @depth
// Retrieve the document (node) corresponding to the startNodeId from the specified collection
LET startNode = DOCUMENT(@nodeCollectionName, startNodeId)

// Step 1: Find all paths from the start node up to the specified depth in the graph
LET allPaths = (
    FOR v, e, p IN 1..depth ANY startNode GRAPH @graphName
    // Return each path found during the traversal
    RETURN p
)

// Step 2: Filter the paths' nodes based on the 'ShouldSkip' property
LET filteredPaths = (
    FOR path IN allPaths
    // Filter out vertices in each path where 'ShouldSkip' is false
    LET filteredVertices = (FOR vertex IN path.vertices FILTER vertex.ShouldSkip == false RETURN vertex)
    // Return the filtered vertices along with the original path
    RETURN {{vertices: filteredVertices, originalPath: path }}
)

// Step 3: Infer the edges of the filtered nodes paths
LET inferredEdges = FLATTEN(
    FOR filteredPath IN filteredPaths
    // Create new edges between consecutive visible vertices
    LET edges = (
        FOR i IN 0..LENGTH(filteredPath.vertices) - 2
        LET startVertex = filteredPath.vertices[i]
        LET endVertex = filteredPath.vertices[i + 1]
        // Return a new edge from startVertex to endVertex
        RETURN {{_from: startVertex._id, _to: endVertex._id }}
    )
    // Flatten the list of edges for all paths
    RETURN edges
)

// Deduplicate the inferred edges
LET distinctEdges = UNIQUE(inferredEdges)

// Extract and deduplicate nodes from the filtered paths
LET allNodes = FLATTEN(
    FOR path IN filteredPaths
    // Extract vertices from each filtered path
    RETURN path.vertices
)
// Deduplicate the nodes
LET distinctNodes = UNIQUE(allNodes)

// Return the start node, distinct nodes, and distinct edges
RETURN 
{{
    StartNode: startNode,
    Nodes: distinctNodes,
    Edges: distinctEdges
}}