What is the library in pyspark that I can use for similar functionality as spark.graphx.GraphLoader. Basically I intend to import the following scala code to pyspark code.
''' import org.apache.spark.graphx.GraphLoader import org.apache.spark.sql.{SaveMode, SparkSession}
object FindTransMatch {
def main(args: Array[String]): Unit = {
println("Hello, World!")
// Creates a SparkSession.
val spark = SparkSession
.builder
.appName("FindTransMatch")
.master("local")
.getOrCreate()
val sc = spark.sparkContext
val graph = GraphLoader.edgeListFile(sc, args(0),false)
// Find the connected profiles
val cc = graph.connectedComponents().vertices
spark
.sqlContext
.createDataFrame(cc.toJavaRDD())
.write
.mode(SaveMode.Overwrite).csv(args(1))
spark.stop()
}
}
'''
I tried installing graphframe from
but the setup.py has only following lines
# Your python setup file. An example can be found at:
# https://github.com/pypa/sampleproject/blob/master/setup.py
needless to say I did do
pip install graphframe
but to no avail.
I saw here someone suggested using ''' pyspark --packages graphframes:graphframes:0.7.0-spark2.3-s_2.11
'''
but I dont understand where to set this?
Adding the graphframes jar so that we can access GraphX API of Apache Spark in pyspark
Jars can be found at this location : https://spark-packages.org/package/graphframes/graphframes