concat in koalas dataframe fails with error "Job aborted due to stage failure"

99 Views Asked by Nikesh At 13 September 2022 at 01:23

I am running pyspark notebook in Azure databricks environment and auto scaling cluster(2 to 32). I have 2 dataframes df1 and df2 and concatenating using the below code using Pandas.

df1 -> 9 columns and around 11 million records df2-> exactly similar schema as df1, 13 million records I am concatenating using pd.concat(df1,df2,IgnoreIndex=True) and is working fine. Now in order to distribute, I am converting Pandas into Koalas, I converted df1 and df2 to Koalas dataframe. But when I concatenate using ks.concat(df1,df2,IgnoreIndex=True),it always gives the below error.

Job aborted due to stage failure: Task 5 in stage 226.0 failed 4 times, most recent failure: Lost task 5.3 in stage 226.0 (TID 17592) (10.54.144.21 executor 125): org.apache.spark.SparkException: Checkpoint block rdd_593_5 not found! Either the executor that originally checkpointed this partition is no longer alive, or the original RDD is unpersisted. If this problem persists, you may consider using rdd.checkpoint() instead, which is slower than local checkpointing but more fault-tolerant

Any help would be much appreciated.

Thanks, Nikesh

Original Q&A

concat in koalas dataframe fails with error "Job aborted due to stage failure"

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DATABRICKS

Related Questions in SPARK-KOALAS

Trending Questions

Popular # Hahtags

Popular Questions