EMR: unable to run steps in parallel

1.7k Views Asked by At

I am trying to run several EMR steps in parallel. I saw other questions regarding this issue on SO, as well as googled options. so things i have tried:

  • Configure CapacityScheduler with set of queues
  • Configure FairScheduler
  • Try to use AWS data pipelines with PARALLEL_FAIR_SCHEDULING, PARALLEL_CAPACITY_SCHEDULING

this wasn't worked for me, yarn was created all queues properly, and submission was done on different queues. But EMR still ran just a single step at once (one step was RUNNING rest of them PENDING)

I also saw from one of the answers that step is meant to be sequential, but you can put several jobs inside single step. I wasn't managed to find a way to do this, and according to UI there is no option for this.

I wasn't tried to submit jobs to yarn cluster directly Submit Hadoop Jobs Interactively, i wanted to submit jobs from AWS API, and i havent found a way to do this from API

This is my configuration for CapacityScheduler CapacityScheduler

This is steps configuration StepsConfiguration

1

There are 1 best solutions below

0
1ambda On

Might be late, but hope this would be helpful.

Spark provides an option that specifying whether the caller (step) will wait or not for spark application completion after submission. You can set this value as false then, AWS emr step will submit and will return immediately.

spark.yarn.submit.waitAppCompletion: "false"