Connect nearby nodes in geographic graph network

89 Views Asked by At

I'm building a graph network of rivers. So far, I've created a networkx graph of lat/lon points on rivers. Each river has edges between it's points. However, no edges exist between different rivers in my dataset. Now I want to connect rivers that geographically intersect, so for example here: enter image description here

I want to add an edge at the red arrow between the end node on the Missouri river and the nearest node on the Mississippi river.

How can this be accomplished? I could iterate through every pair of nodes, calculate their great-circle distance, and add an edge if it's under the distance limit. This could work but seems slow and I wonder if there's a more built-in way to do this?

2

There are 2 best solutions below

0
Paul Brodersen On
  1. You only need to check the leaves of each component (=river), i.e. nodes with degree 1 (i.e. the endpoints of each river arm). If the graph is directed, you can further restrict your search to nodes with either in-degree 1 or out-degree 1, depending how the directionality of the edges corresponds to the directionality of the rivers.
my_leaves = [node for node in G if G.degree(node) == 1]  
  1. You can use scipy.spatial.KDTree.query_ball_point to efficiently find all other nodes within a specified distance. Then select the nearest node that is not part of the same component as the leaf.
tree = KDTree(my_node_positions)
neighbours = tree.query_ball_point(my_leaf_node_positions, radius=my_cutoff)
0
ravenspoint On

The snag is that the node on the end of one river might not be 'close' to any node on the other river. Like this:

enter image description here

So you must check every leaf node for its distance from the line between every pair of nodes on other rivers.

The algorithm for calculating the distance of a point from a line segment is easily found by googling. Note that you do not need any fancy great circle distance calculations - the distances you are interested in finding are sufficiently approximated by assuming a flat plane.

If you have a very large dataset, like maybe all the rivers in North America, you will need to prune the line segments you will check to those in the approximate neighborhood - this can be done by using a quadtree ( see https://github.com/JamesBremner/quadtree )

Alternative Approach

Preprocess your dataset, adding nodes along the lines between nodes that are far apart so that no node is no more than a specified distance from any possible merge point. This makes the finding of the merge point simpler by looking only at node-node distances rather than node-line segment distances. If you need to this a lot then the cost of pre-processing and the storage for the extra nodes might be worthwhile - however, I cannot image why you would need to this more than once.