Unable to run hop pipelines on Spark running on Kubernetes

114 Views Asked by Varun Kapoor At 27 January 2022 at 06:11

I am looking for help in running hop pipelines on Spark cluster, running on kubernetes.

I have spark master deployed with 3 worker nodes on kubernetes
I am using hop-run.sh command to run pipeline on spark running on kubernetes.

Facing Below exception -java.lang.NoClassDefFoundError: Could not initialize class com.amazonaws.services.s3.AmazonS3ClientBuilder

Looks like fat.jar is not getting associated with the spark when running hop-run.sh command.

I tried running same with spark-submit command too but not sure how to pass references of pipelines and workflows to Spark running on kubernetes, though I am able to add fat jar to the classpath (can be seen in logs)

Any kind of help is appreciated. Thanks like

Original Q&A

There are 1 best solutions below

HansVA On 08 February 2022 at 21:49

Could it be that you are using version 1.0? We had a missing jar for S3 VFS which has been resolved in 1.1 https://issues.apache.org/jira/browse/HOP-3327

For more information on how to use spark-submit you can take a look at the following documentation: https://hop.apache.org/manual/latest/pipeline/pipeline-run-configurations/beam-spark-pipeline-engine.html#_running_with_spark_submit

The location to the fat-jar the pipeline and the required metadata-export can all be VFS locations so no need to place those on the cluster itself.

Unable to run hop pipelines on Spark running on Kubernetes

There are 1 best solutions below

Related Questions in APACHE-SPARK

Related Questions in KUBERNETES

Related Questions in APACHE-HOP

Trending Questions

Popular # Hahtags

Popular Questions