How to obtain all pairs shortest path length for large graph with 100k(100,000) nodes?

289 Views Asked by At

Shortest path with 10,0000 nodes, 5 billion edges.

Suppose matrix $A$ is the symmetric weighted ajacency matrix (Undirected Graph), and element $A_{ij}$ represents the weight of edge $e_{ij}$, suppose the matrix size is $100000\times 100000$, so here are $n(n-1)/2=4999950000$ edges, essentially 5 billion edges, I wish to obtain two matrixes $L$ and $W$:

Let $P_{ij}=\left{v_1, v_2, \cdots, v_n\right}$ represents one of the shortest path with $n$ nodes from $i$ to $j$ (or $j$ to $i$ due to symmetric), hence we can calculate the following two quanties [L_{ij} = n-1, W_{ij} = \sum_{i=1}^{n-1}A_{v_i,v_{i+1}}]

(1) The Matrix $L$ of weighted shortest path length, whiere $L_{ij}$ represents the length of shortest from node $i$ to node $j$.

(2) The matrix $W$ of shortest path cumulative edge weights, where $W_{ij}$ represents the cumulative edge weight along the path.

Here I got that for all pair shortest path, there are algorithms like Floyd Warshall, Johnson algorithms, but it would require hundreds of days to calculate all pairs path length, what is the possible suitable solution for this?

I tried Python igraph with python code

import tqdm
import numpy as np
import pandas as pd
import igraph as ig

def all_pairs_shortest_length(ajacency_matrix):
    m, n = ajacency_matrix.shape # ajacency_matrix is weighted
    L = np.zeros((m,n), dtype=int) # L is symmetric
    G = ig.Graph.Weighted_Adjacency(ajacency_matrix, mode = 'undirected') # 340G memory to construct graph with 65000 nodes
    for v in tqdm.trange(m):
        # v as start node, only calculate target nodes  from v+1 to m
        paths_rest = ig.GraphBase.get_all_shortest_paths(G, v=v, to = range(v+1, m), weights=G.es["weight"], mode="out")
        target_with_length = pd.DataFrame([[path[-1],len(path)-1] for path in paths_rest], columns=['target','length']).drop_duplicates(ignore_index=True)
        # for weighted shortest path, choose the path with minimum number of edges
        L[v,v+1:] = target_with_length.groupby('target', group_keys=True).apply(min).length
    L = L + L.T
    M = np.array(ig.GraphBase.distances(G, weights=G.es["weight"]))
    return L, M

if __name__=='__main__':
    test_N = 100000
    b = np.random.random_integers(1,100, size=(test_N, test_N))
    ajacency_matrix = (b + b.T)/2
    np.fill_diagonal(ajacency_matrix, 0)
    L, M = all_pairs_shortest_length(ajacency_matrix)
1

There are 1 best solutions below

2
ravenspoint On

The Dijkstra algorithm is the fastest way to find the shortest paths between nodes. One run of the algorithm finds the shortest paths from one node to every other node. To find the shortest path between every pair of nodes will require many runs on a directed graph. An undirected graph will require fewer runs since the shortest path between u and v is just the reverse of the path between v and u. You do not tell us if your graph is directed or not.

In any case, this does take a long time for large graphs, though if your implementation of Dijkstra is well coded you will need minutes rather than days.

Getting reasonable performance from code written in Python will be a challenge. If you care about performance, you should consider a compiled coding language such as C++ which can give up to 50 times faster run times than the same algorithm coded in Python.

Here is some C++ code implementing the Dijkstra algorithm.

void dijsktra(
    const cGraph &g,
    int start,
    std::vector<double> &dist,
    std::vector<int> &pred)
{
    // shortest distance from start to each node
    dist.clear();
    dist.resize(g.vertexCount(), INT_MAX);

    // previous node on shortest path to each node
    pred.clear();
    pred.resize(g.vertexCount(), -1);

    std::vector<bool> sptSet(g.vertexCount(), false); // sptSet[i] will be true if vertex i is included in shortest
                                                      // path tree or shortest distance from src to i is finalized

    // Distance of source vertex from itself is always 0
    dist[start] = 0;
    pred[start] = 0;

    // Find shortest path for all vertices
    for (int count = 0; count < g.vertexCount() - 1; count++)
    {
        // Pick the minimum distance vertex from the set of vertices not
        // yet processed. u is always equal to src in the first iteration.
        int min = INT_MAX, uidx;
        for (int vidx = 0; vidx < g.vertexCount(); vidx++)
            if (sptSet[vidx] == false && dist[vidx] <= min)
            {
                min = dist[vidx];
                uidx = vidx;
            }
        if (min == INT_MAX)
        {
            // no more nodes connected to start
            break;
        }

        // Mark the picked vertex as processed
        sptSet[uidx] = true;

        // Update dist value of the adjacent vertices of the picked vertex.
        for (auto vp : g.adjacentOut(uidx))
        {
            if (sptSet[vp])
                continue; // already processed

            // Update dist[v] only if total weight of path from src to  v through u is
            // smaller than current value of dist[v]
            double cost = atof(g.rEdgeAttr(g.find(uidx, vp), 0).c_str());
            if (dist[uidx] + cost < dist[vp])
            {
                dist[vp] = dist[uidx] + cost;
                pred[vp] = uidx;
            }
        }
    }
}

You can find the code for the complete path finding application at https://github.com/JamesBremner/PathFinder

Run time for running the Dijkstra algorithm to find the cheapest paths from one randomly selected vertex to every other vertex on large graphs downloaded from https://dyngraphlab.github.io/. Longest result from 3 runs using the graphex application.

Vertex Count Edge Count Run time ( secs )
10,000 50,000 0.3
145,000 2,200,000 40

Here is an introduction the Dijkstra algorithm https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm