I've written an shortest path (SSSP) Pregel application which I'm trying to distribute to a Apache Spark Standalone Cluster via Bitnami Docker. Everything in the script is running without problem until the Pregel-function is started. Then, nothing happens. I'm using Apache Spark 3.3.2 with Scala 2.12.
The script I'm trying to run is from the Spark repo, it runs until the line val sssp = initialGraph.pregel(Double.PositiveInfinity)( where nothing happens. It doesn't stop, but it get stuck. The master log is sayin "launching executor app" "Removing executor app" "launching executor app" and so on. The worker log is just setting up, finishing, do clean-up and then repeats again.
import org.apache.spark.sql.SparkSession
/**
* An example use the Pregel operator to express computation
* such as single source shortest path
* Run with
* {{{
* bin/run-example graphx.SSSPExample
* }}}
*/
object SSSPExample {
def main(args: Array[String]): Unit = {
// Creates a SparkSession.
val spark = SparkSession
.builder
.appName(s"${this.getClass.getSimpleName}")
.getOrCreate()
val sc = spark.sparkContext
// $example on$
// A graph with edge attributes containing distances
val graph: Graph[Long, Double] =
GraphGenerators.logNormalGraph(sc, numVertices = 100).mapEdges(e => e.attr.toDouble)
val sourceId: VertexId = 42 // The ultimate source
// Initialize the graph such that all vertices except the root have distance infinity.
val initialGraph = graph.mapVertices((id, _) =>
if (id == sourceId) 0.0 else Double.PositiveInfinity)
val sssp = initialGraph.pregel(Double.PositiveInfinity)(
(id, dist, newDist) => math.min(dist, newDist), // Vertex Program
triplet => { // Send Message
if (triplet.srcAttr + triplet.attr < triplet.dstAttr) {
Iterator((triplet.dstId, triplet.srcAttr + triplet.attr))
} else {
Iterator.empty
}
},
(a, b) => math.min(a, b) // Merge Message
)
println(sssp.vertices.collect.mkString("\n"))
// $example off$
spark.stop()
}
}
When I check the Spark UI I can see this: Spark UI when running Pregel
Anyone having a clue what's happening? I thinks it is very strange because the Pregel API is supposed to be distributable.
I've tried to run the individual actions in the Pregel (triplet, Iterator, math.min etc) separetly and they are able to run on the Docker cluster.