which library in pyspark implements graphx api

161 Views Asked by At

What is the library in pyspark that I can use for similar functionality as spark.graphx.GraphLoader. Basically I intend to import the following scala code to pyspark code.

''' import org.apache.spark.graphx.GraphLoader import org.apache.spark.sql.{SaveMode, SparkSession}

object FindTransMatch {
  def main(args: Array[String]): Unit = {
    println("Hello, World!")

        // Creates a SparkSession.
    val spark = SparkSession
      .builder
      .appName("FindTransMatch")
      .master("local")
      .getOrCreate()
    val sc = spark.sparkContext

    val graph = GraphLoader.edgeListFile(sc, args(0),false)

    // Find the connected profiles
    val cc = graph.connectedComponents().vertices
    spark
      .sqlContext
      .createDataFrame(cc.toJavaRDD())
      .write
      .mode(SaveMode.Overwrite).csv(args(1))
    spark.stop()
  }
}

'''

I tried installing graphframe from

but the setup.py has only following lines

# Your python setup file. An example can be found at:
# https://github.com/pypa/sampleproject/blob/master/setup.py

needless to say I did do

pip install graphframe

but to no avail.

I saw here someone suggested using ''' pyspark --packages graphframes:graphframes:0.7.0-spark2.3-s_2.11

'''

but I dont understand where to set this?

1

There are 1 best solutions below

0
user238607 On
Adding the graphframes jar so that we can access GraphX API of Apache Spark in pyspark

Jars can be found at this location : https://spark-packages.org/package/graphframes/graphframes

spark = SparkSession.builder \
    .appName("MyApp") \
    .config("spark.jars", "file:/path/to/spark-jars/graphframes-0.8.2-spark3.2-s_2.12.jar") \
    .getOrCreate()