I am looking for help in running hop pipelines on Spark cluster, running on kubernetes.
- I have spark master deployed with 3 worker nodes on kubernetes
- I am using hop-run.sh command to run pipeline on spark running on kubernetes.
Facing Below exception -java.lang.NoClassDefFoundError: Could not initialize class com.amazonaws.services.s3.AmazonS3ClientBuilder
Looks like fat.jar is not getting associated with the spark when running hop-run.sh command.
I tried running same with spark-submit command too but not sure how to pass references of pipelines and workflows to Spark running on kubernetes, though I am able to add fat jar to the classpath (can be seen in logs)
Any kind of help is appreciated. Thanks like
Could it be that you are using version 1.0? We had a missing jar for S3 VFS which has been resolved in 1.1 https://issues.apache.org/jira/browse/HOP-3327
For more information on how to use spark-submit you can take a look at the following documentation: https://hop.apache.org/manual/latest/pipeline/pipeline-run-configurations/beam-spark-pipeline-engine.html#_running_with_spark_submit
The location to the fat-jar the pipeline and the required metadata-export can all be VFS locations so no need to place those on the cluster itself.