EMR task keep in RUNNING state even after the Spark job has finished

44 Views Asked by Rinze At 04 February 2024 at 09:42

I was running a PySpark job (with Apache Hudi) on AWS EMR on EKS, the driver code was like:

with (SparkSession.builder
            .appName(f"App")
            .config('spark.serializer', 'org.apache.spark.serializer.KryoSerializer')
            .config('spark.sql.extensions', 'org.apache.spark.sql.hudi.HoodieSparkSessionExtension')
            .getOrCreate()) as spark:
    # Add a new column to my Hudi table
    spark.sql('alter table my_table add columns (my_date date)')
    # Merge a data set into my Hudi table
    spark.sql('merge into _mytable ...')
    
    spark.stop()

print('FINISH')
sys.exit(0)

This job keeps in RUNNING state in EMR, but the job is finished and exited actually. I could see the job was finished in Spark UI. And in the output log, I could see the FINISH printed at the last line of my script. Also, I've checked S3, and the data modification has been finished. But the state of that task in EMR still keep RUNNING unless I cancel manually.

Example:

Spark History Server shows this job finished in 2.3 minutes. But in AWS EMR Console, it still keep running until I stop it after 50 minutes:

Anyone know what causes this problem?

Original Q&A

EMR task keep in RUNNING state even after the Spark job has finished

There are 0 best solutions below

Related Questions in AMAZON-WEB-SERVICES

Related Questions in APACHE-SPARK

Related Questions in AMAZON-EMR

Related Questions in EMR-SERVERLESS

Trending Questions

Popular # Hahtags

Popular Questions