I have multiple pig jobs in a GCP workflow template with dependencies as below:
export env=dev
export REGION=us-east4
gcloud dataproc workflow-templates create test1 --region=$REGION
gcloud dataproc workflow-templates set-cluster-selector test1 \
--region=$REGION \
--cluster-labels=goog-dataproc-cluster-uuid=XXXXXXXXXXXXXXXXXXX
gcloud dataproc workflow-templates add-job pig \
--file=FILE, -f gs://dnb-p2d-d-sto-g-inbound/steps/pig_job1.sh \
--region=$REGION \
--step-id=pig_job1 \
--workflow-template=test1
gcloud dataproc workflow-templates add-job pig \
--file=FILE, -f gs://dnb-p2d-d-sto-g-inbound/steps/pig_job2.sh \
--region=$REGION \
--step-id=pig_job2 \
--start-after pig_job1 \
--workflow-template=test1
gcloud dataproc workflow-templates add-job pig \
--file=FILE, -f gs://dnb-p2d-d-sto-g-inbound/steps/pig_job3.sh \
--region=$REGION \
--step-id=pig_job3 \
--start-after pig_job2 \
--workflow-template=test1
gcloud dataproc workflow-templates instantiate test1 --region=$REGION
Is there any provision to execute GCP workflow from point of failure?
What I mean to say is that, suppose if step-id=pig_job2 fails due to some reason, is there any way that we can execute this workflow from step-id=pig_job2 only. (without creating new workflow).
I tried to use: https://stackoverflow.com/questions/71716824/gcp-workflows-easy-way-to-re-run-failed-execution
but it was not useful.
I am expecting step-id=pig_job2 to get executed directly and then remaining jobs as per dependencies.
To resume a failed workflow execution from a specific step, use the
--resumeflag (supported flag) with the following command:Please note that the
--resumeflag is only applicable if the workflow has been executed before. If you haven't run the workflow previously, the--resumeflag cannot be used.To get help on a specific command, use
-hor--helpflag.For more information on this command, you can refer to the official documentation here: gcloud dataproc workflow-templates instantiate.
If you need more advanced retry and error handling capabilities, consider exploring alternative workflow tools like Apache Airflow, Apache Beam, or other systems that can integrate with Google Cloud Platform. The Stack Overflow link you provided discusses a custom solution, which is not a built-in feature of Google Cloud Dataproc workflow templates.