We have 6 node manager Linux machines in Hadoop cluster , and all machines are co-hosted with data node machines
the first node managers 3 machines are with 32 Cores , and the other 3 node managers machines are with 96 cores
as results of diff spec of cores between machines , we configured the yarn config group ( Manage Config Groups ) , as the following
first group – 32 Core for node manager 01 / 02 / 03
second group – 96 core for node manager 04 / 05 / 06
and after above setting in Ambari yarn configuration , we restart yarn service in order to take effect of config group settings ,
after this setting we run our spark application and executes are running on all node managers machines
so now we are expect that resources manager will take the cores according to our config group setting
but when we look on yarn cluster node page we noticed that resource manager not utilizes effectively the 96 cores from node managers 04 / 05 / 06
and as results many executers are failed to run/start because "not enough" cores resources in spite we have additional cores on node manager 04 / 05 / 06 machines
here is example
Vcore used Vcore Available
node manager 1 30 2 ( total 32 Vcore configured in config group )
node manager 2 31 1 ( total 32 Vcore configured in config group )
node manager 3 31 1 ( total 32 Vcore configured in config group )
node manager 4 34 62 ( total 96 Vcore configured in config group )
node manager 5 19 77 ( total 96 Vcore configured in config group )
node manager 6 24 72 ( total 96 Vcore configured in config group )
and we are expecting that Vcore used on node manager 04 / 05 / 06 will be nearly 96 Core as we already configured in yarn config group , and seems that resource manager not aware about the 96 core on node managers machines
so what is the reason that resource manager not take the most cores from the machine with 96 cores? , is it because some additional tuning that isn’t configured or else?