Caching BFS traversal in an undirected and unweighted graph

172 Views Asked by At

I have an undirected and unweighted graph with about a million nodes - visualized in a matrix format.

Representation of a sample graph:

![enter image description here

The red cells are blocked.

My problem is to find the shortest distance between any two cells. The source and destination cells will be given one by one, and I need to return the shortest paths for all given cell pairs. The traversal is manhattan only (up,down,left,right -no diagonal).No path can go through the blocked cells.

I looked at the following:

  • the Floyd-warshall algorithm: but it takes too much memory.
  • Dijkstra's algorithm for all nodes, but it is too slow.
  • A* could work, but would still be slow - if the shortest path needs to be found for all pairs.

Is there any other approach for it?

Since it's just a uniform matrix, I would think that there might be some pattern I am unaware of.

Can a BFS traversal be cached somehow, so that a search can resume at an already visited node rather than starting from scratch again?

2

There are 2 best solutions below

3
Matt Timmermans On

If your graph is a 1000x1000 grid, then let every cell (x,y) be a "border cell" if x%100 = 0 or y%100 = 0.

This divides your graph into 100 regions separated by borders.

If you need to find a shortest path between points in the same region, then you can just use BFS within the region, and you will search at most 10K nodes.

When you need to find a path between different regions, then it will always consist of a path to a border node, maybe a path between border nodes, and then a path from a border node to the target.

You can accelerate the search by using Floyd-Warshall to precompute the shortest path distance between every pair of border cells. There are only about 20000 of those, so this will require storing about 200 million lengths.

Your BFS search will then be limited to edges within the source region, F-W paths between the source and target regions, and edges within the target region.

This will provide a speed-up of 20-50x for worst-case searches.

8
ravenspoint On

Since your vertices are located on an orthogonal grid, a "Manhattan" path between two cells can be calculated instantly. e.g from 1,3 to 5,4 can go 1,3; 1,4; 2,4; 3,4; 4,4; 5,4.

Since no path can go through a blocked vertex: Check as you go along and change the dimension being altered if you hit a red, blocked cell.

Your performance requirement suggests you need about 1 millisecond per path calculation ( for a million paths in in 20 minutes ). That might just be possible if you can use Manhattan paths, otherwise no.

Here is some code to calculate the Manhattan path

std::vector<std::pair<int, int>>
path(int sx, int sy, int dx, int dy)
{
std::vector<std::pair<int, int>> ret;
ret.push_back(std::make_pair(sx, sy));
int deltax = dx - sx;
int deltay = dy - sy;
if (deltax)
{
    int inc = 1;
    if (deltax < 0)
        inc = -1;
    int x = sx;
    do
    {
        x += inc;
        ret.push_back(std::make_pair(x, sy));
    } while (x != dx);
}
if (deltay)
{
    int inc = 1;
    if (deltay < 0)
        inc = -1;
    int y = sy;
    do
    {
        y += inc;
        ret.push_back(std::make_pair(dx, y));
    } while (y != dy);
}
return ret;

}

Complete application at https://github.com/JamesBremner/so76134936

Test run:

C:\Users\James\code\so76134936\bin>starter
Input source location (x  y ) and destination location, e.g:
 1 3 6 7
10 3 6 7
raven::set::cRunWatch code timing profile
Calls           Mean (secs)     Total           Scope
       1        9.9e-06 9.9e-06 path calculation
10 3
9 3
8 3
7 3
6 3
6 4
6 5
6 6
6 7

test run for long path:

Input source location (x  y ) and destination location, e.g:
 1 3 6 7
 10 3 600 800
raven::set::cRunWatch code timing profile
Calls       Mean (secs) Total       Scope
       1    7.25e-05    7.25e-05    path calculation
10 3
11 3
12 3
13 3
14 3
15 3
16 3
17 3
18 3
19 3
20 3
21 3
22 3
23 3
24 3
25 3
26 3
27 3
    ...
600 791
600 792
600 793
600 794
600 795
600 796
600 797
600 798
600 799
600 800

Note: if blocks were accounted for, performance would be slower.

Conclusion: Your performance requirement ( 1 millisecond per path ) is achievable.

I have checked in code that checks for a blocked path. It simply tries going around the other way and gives up if that is also blocked. You will need something more sophisticated, but this code should give you a good place to start. ( If you would like me to proceed further, get in touch through the github repo and we can discuss details )