I am facing below error while running spark action through oozie workflow on an EMR 5.14 cluster:
pyspark.sql.utils.IllegalArgumentException: u"Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog'" My Pyspark script runs fine when executed as a normal spark job but is not being executed via Oozie Pyspark Program:-
spark = SparkSession.builder.appName("PysparkTest").config("hive.support.quoted.identifiers", "none").enableHiveSupport().getOrCreate()
sc = SparkContext.getOrCreate();
sqlContext = HiveContext(sc)
sqlContext.sql("show databases").show()
I have created a workflow.xml and job.properties taking reference from the LINK.
I copied all the spark and hive related configuration file under the same directory($SPARK_CONF_DIR/). Hive is also configured to use MySQL for the metastore.
It will be great if you can help me figure out the problem which I am facing when running this Pyspark program as a jar file in an Oozie spark action.
Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog'This means theCatalogjar its trying find is not is ooziesharelib spark directory.Please add the following property in your
job.propertiesfile.Also can you please post the whole log?
And if possible could you please run the same on EMR 5.29, I have faced few jar issue on 5.26 and the lower version while running PySpark.