GraphX - assign to each vertex of a graph the IDs of all connected elements satisfying a certain property

29 Views Asked by Prassi At 07 April 2023 at 15:09

Suppose you are modelling railway tracks and train stations using the following classes:

sealed abstract class ConnectedElement extends java.io.Serializable {
  def location: String
}
case class RailwayTrack(location: String, length: Double = 0.0)
    extends ConnectedElement
case class TrainStation(location: String, isRural: Boolean = false)
    extends ConnectedElement

In the first approximation, we can assume that the graph modelling the connections between these elements is a path graph, meaning that each RailwayTrack should sit between two TrainStation. For example, this is a possible graph (RT represents railway tracks and TS train stations):

TS - RT - RT - RT - TS - RT - RT - TS - RT - TS

We could evolve this simple model to also include other elements in the hierarchy defined above (i.e. elements that sit between railway tracks that are not train stations) or to allow train stations to be connected to multiple paths to other train stations, for example:

                   TS
                    |
                   RT
                    |
TS - RT - X - RT - TS - RT - RT - TS

Suppose the following RDD represent a GraphX Graph of the first type:

import org.apache.spark.sql.SparkSession
import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD

val spark: SparkSession = SparkSession.builder().master("local").getOrCreate()
val sc = spark.sparkContext

val elements: RDD[(VertexId, ConnectedElement)] =
  sc.parallelize(
    Seq(
      (1L, RailwayTrack("l1")),
      (2L, RailwayTrack("l2")),
      (3L, RailwayTrack("l3")),
      (4L, RailwayTrack("l4")),
      (5L, RailwayTrack("l1")),
      (6L, RailwayTrack("l4")),
      (7L, RailwayTrack("l5")),
      (8L, TrainStation("l2")),
      (9L, TrainStation("l4")),
      (10L, TrainStation("l3"))
    )
  )
val connections: RDD[Edge[String]] =
  sc.parallelize(
    Seq(
      Edge(8L, 1L),
      Edge(1L, 2L),
      Edge(2L, 3L),
      Edge(3L, 9L),
      Edge(9L, 4L),
      Edge(4L, 5L),
      Edge(5L, 6L),
      Edge(6L, 7L),
      Edge(7L, 10L)
    )
  )

val graph = Graph(elements, connections)

I would like to write an algorithm that returns something like:

val expected = sc.parallelize(Seq(
  ((1L, RailwayTrack("l1")), (8L, TrainStation("l2")), (9L, TrainStation("l4"))),
  ((2L, RailwayTrack("l2")), (8L, TrainStation("l2")), (9L, TrainStation("l4"))),
  ((3L, RailwayTrack("l3")), (8L, TrainStation("l2")), (9L, TrainStation("l4"))),
  ((4L, RailwayTrack("l4")), (9L, TrainStation("l4")), (10L, TrainStation("l3"))),
  ((5L, RailwayTrack("l1")), (9L, TrainStation("l4")), (10L, TrainStation("l3"))),
  ((6L, RailwayTrack("l4")), (9L, TrainStation("l4")), (10L, TrainStation("l3"))),
  ((7L, RailwayTrack("l5")), (9L, TrainStation("l4")), (10L, TrainStation("l3")))
))

That is: an RDD of tuples of three elements, where the first is the railway track vertex, the second and the third are the train station vertices that enclose the railway track vertex.

How can this problem be solved using GraphX? Can it be extended to more complex graphs, provided that a railway track always sits between two train stations?

Original Q&A

GraphX - assign to each vertex of a graph the IDs of all connected elements satisfying a certain property

There are 0 best solutions below

Related Questions in SCALA

Related Questions in APACHE-SPARK

Related Questions in GRAPH-THEORY

Related Questions in SPARK-GRAPHX

Trending Questions

Popular # Hahtags

Popular Questions