Unable to run hop pipelines on Spark running on Kubernetes

114 Views Asked by At

I am looking for help in running hop pipelines on Spark cluster, running on kubernetes.

  1. I have spark master deployed with 3 worker nodes on kubernetes
  2. I am using hop-run.sh command to run pipeline on spark running on kubernetes.

Facing Below exception -java.lang.NoClassDefFoundError: Could not initialize class com.amazonaws.services.s3.AmazonS3ClientBuilder

Looks like fat.jar is not getting associated with the spark when running hop-run.sh command.


I tried running same with spark-submit command too but not sure how to pass references of pipelines and workflows to Spark running on kubernetes, though I am able to add fat jar to the classpath (can be seen in logs)

Any kind of help is appreciated. Thanks like

1

There are 1 best solutions below

0
HansVA On

Could it be that you are using version 1.0? We had a missing jar for S3 VFS which has been resolved in 1.1 https://issues.apache.org/jira/browse/HOP-3327

For more information on how to use spark-submit you can take a look at the following documentation: https://hop.apache.org/manual/latest/pipeline/pipeline-run-configurations/beam-spark-pipeline-engine.html#_running_with_spark_submit

The location to the fat-jar the pipeline and the required metadata-export can all be VFS locations so no need to place those on the cluster itself.