I am facing below error while running spark action through oozie workflow on an EMR 5.14 cluster:

pyspark.sql.utils.IllegalArgumentException: u"Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog'" My Pyspark script runs fine when executed as a normal spark job but is not being executed via Oozie Pyspark Program:-

    spark = SparkSession.builder.appName("PysparkTest").config("hive.support.quoted.identifiers", "none").enableHiveSupport().getOrCreate()
    sc = SparkContext.getOrCreate();
    sqlContext = HiveContext(sc)
    sqlContext.sql("show databases").show()

I have created a workflow.xml and job.properties taking reference from the LINK.

I copied all the spark and hive related configuration file under the same directory($SPARK_CONF_DIR/). Hive is also configured to use MySQL for the metastore.

It will be great if you can help me figure out the problem which I am facing when running this Pyspark program as a jar file in an Oozie spark action.

1

There are 1 best solutions below

0
Snigdhajyoti On

Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog' This means the Catalog jar its trying find is not is ooziesharelib spark directory.

Please add the following property in your job.properties file.

oozie.action.sharelib.for.spark=hive,spark,hcatalog

Also can you please post the whole log?

And if possible could you please run the same on EMR 5.29, I have faced few jar issue on 5.26 and the lower version while running PySpark.