Run a Hop workflow with Spark submit in a Hadoop/Yarn cluster

26 Views Asked by At

I have a Hadoop/Yarn cluster with 3 machines running on Ubuntu Server. I'm trying to run an Apache Hop workflow using Spark submit on them, but I can't seem to find the right way to pass the workflow as an argument. I built the Hop fat jar and exported the metadata. My $PROJECT_HOME is set correctly.

I tried all possible paths I could think of. I tried adding file:// before the path. I tried using a variable ($PROJECT_HOME) in it. This is one of the commands I tried:

./spark-submit --class org.apache.hop.beam.run.MainBeam --driver-java-options '-DPROJECT_HOME=$PROJECT_HOME' --master yarn --deploy-mode cluster --num-executors 4 --executor-cores 4 --executor-memory 3g /home/hduser/dados/testing/test.jar file://opt/hop/config/projects/armazem_de_dados/DW\ Território/ETL/WF_DW_TERRITORIO.hwf file://home/hduser/dados/testing/meta.json local

This is the error I always get:

Error running Beam pipeline: Error reading from file file://opt/hop/config/projects/armazem_de_dados/DW Território/ETL/WF_DW_TERRITORIO.hwf java.io.IOException: Error reading from file file://opt/hop/config/projects/armazem_de_dados/DW Território/ETL/WF_DW_TERRITORIO.hwf at org.apache.hop.beam.run.MainBeam.readFileIntoString(MainBeam.java:165) at org.apache.hop.beam.run.MainBeam.main(MainBeam.java:86) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:738) Caused by: org.apache.hop.core.exception.HopFileException:

org.apache.commons.vfs2.FileNotFoundException: Could not read from "file:///opt/hop/config/projects/armazem_de_dados/DW Território/ETL/WF_DW_TERRITORIO.hwf" because it is not a file. Could not read from "file:///opt/hop/config/projects/armazem_de_dados/DW Território/ETL/WF_DW_TERRITORIO.hwf" because it is not a file.

Could not read from "file:///opt/hop/config/projects/armazem_de_dados/DW Território/ETL/WF_DW_TERRITORIO.hwf" because it is not a file.

This happens when I don't use file://

Caused by: java.io.FileNotFoundException: /home/hduser/dados/cluster_local/hdfs/tmp/nm-local-dir/usercache/hduser/appcache/application_1707998290673_0008/container_1707998290673_0008_02_000001/DW Território/ETL/WF_DW_TERRITORIO.hwf (No such file or directory)

0

There are 0 best solutions below