How to do conversion of pyspark RDD to dataframe?

20 Views Asked by At

RDD is generated after executing spark.SparkContext.parallelize(string.splitlines()) Later I tried toDF() then to pandas DF technique, partition, coalesce and various techniques all fail due to max rpc size error and we cannot modify any of the configurations. So I'm stuck on how to take it ahead. Size of the object is really huge around 18216129049 bytes.

I wanted it to be converted to a dataframe be it pandas or pyspark as I have to do some transformations and write to a database.

Error:org.apache.spark.SparkException: Job aborted due to stage failure serialized task 74:0 was 18216129049 bytes which exceeds max allowed: spark.rpc.message.maxSize (524288000 bytes)

0

There are 0 best solutions below