How to do conversion of pyspark RDD to dataframe?

20 Views Asked by myself At 30 November 2023 at 16:41

RDD is generated after executing spark.SparkContext.parallelize(string.splitlines()) Later I tried toDF() then to pandas DF technique, partition, coalesce and various techniques all fail due to max rpc size error and we cannot modify any of the configurations. So I'm stuck on how to take it ahead. Size of the object is really huge around 18216129049 bytes.

I wanted it to be converted to a dataframe be it pandas or pyspark as I have to do some transformations and write to a database.

Error:org.apache.spark.SparkException: Job aborted due to stage failure serialized task 74:0 was 18216129049 bytes which exceeds max allowed: spark.rpc.message.maxSize (524288000 bytes)

Original Q&A

How to do conversion of pyspark RDD to dataframe?

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in PYSPARK

Related Questions in RDD

Trending Questions

Popular # Hahtags

Popular Questions