I am using Apache Graphx (https://spark.apache.org/docs/latest/graphx-programming-guide.html).
I am using the connected component functionality https://spark.apache.org/docs/latest/graphx-programming-guide.html#connected-components.
This is working fine for smaller scale of data, but i see memory issues when the amount of data contains 2 million edges.
I am using AWS Glue to trigger the graphx job and i get following exceptions
23/08/15 22:04:15 INFO DAGScheduler: Job 323 finished: fold at VertexRDDImpl.scala:90, took 6.015887 s
2023-08-15T15:04:16.002-07:00 23/08/15 22:04:16 INFO DAGScheduler: Got job 324 (fold at VertexRDDImpl.scala:90) with 1000 output partitions
2023-08-15T15:04:20.529-07:00 23/08/15 22:04:20 WARN TaskSetManager: Lost task 74.0 in stage 107376.0 (TID 1121058) (172.34.182.240 executor 49): java.io.IOException: unexpected exception type
23/08/15 22:04:20 ERROR GlueExceptionAnalysisListener: [Glue Exception Analysis] { "Event": "GlueExceptionAnalysisTaskFailed", "Timestamp": 1692137060573, "Failure Reason": "unexpected exception type",