Issue in Connected Component GraphX - Memory Issue

38 Views Asked by Akshit Verma At 15 August 2023 at 22:21

I am using Apache Graphx (https://spark.apache.org/docs/latest/graphx-programming-guide.html).

I am using the connected component functionality https://spark.apache.org/docs/latest/graphx-programming-guide.html#connected-components.

This is working fine for smaller scale of data, but i see memory issues when the amount of data contains 2 million edges.

I am using AWS Glue to trigger the graphx job and i get following exceptions

23/08/15 22:04:15 INFO DAGScheduler: Job 323 finished: fold at VertexRDDImpl.scala:90, took 6.015887 s

2023-08-15T15:04:16.002-07:00 23/08/15 22:04:16 INFO DAGScheduler: Got job 324 (fold at VertexRDDImpl.scala:90) with 1000 output partitions

2023-08-15T15:04:20.529-07:00 23/08/15 22:04:20 WARN TaskSetManager: Lost task 74.0 in stage 107376.0 (TID 1121058) (172.34.182.240 executor 49): java.io.IOException: unexpected exception type

23/08/15 22:04:20 ERROR GlueExceptionAnalysisListener: [Glue Exception Analysis] { "Event": "GlueExceptionAnalysisTaskFailed", "Timestamp": 1692137060573, "Failure Reason": "unexpected exception type",

Original Q&A

Issue in Connected Component GraphX - Memory Issue

There are 0 best solutions below

Related Questions in SPARK-GRAPHX

Trending Questions

Popular # Hahtags

Popular Questions