Why yarn resource manager not consume the max cores from node manager machines

44 Views Asked by At

We have 6 node manager Linux machines in Hadoop cluster , and all machines are co-hosted with data node machines

the first node managers 3 machines are with 32 Cores , and the other 3 node managers machines are with 96 cores

as results of diff spec of cores between machines , we configured the yarn config group ( Manage Config Groups ) , as the following

  1. first group – 32 Core for node manager 01 / 02 / 03

  2. second group – 96 core for node manager 04 / 05 / 06

and after above setting in Ambari yarn configuration , we restart yarn service in order to take effect of config group settings ,

after this setting we run our spark application and executes are running on all node managers machines

so now we are expect that resources manager will take the cores according to our config group setting

but when we look on yarn cluster node page we noticed that resource manager not utilizes effectively the 96 cores from node managers 04 / 05 / 06

and as results many executers are failed to run/start because "not enough" cores resources in spite we have additional cores on node manager 04 / 05 / 06 machines

here is example

                    Vcore used  Vcore Available
 node manager 1     30           2     ( total 32 Vcore configured in config group )                     
 node manager 2     31           1     ( total 32 Vcore configured in config group ) 
 node manager 3     31           1    ( total 32 Vcore configured in config group ) 

 node manager 4     34           62     ( total 96 Vcore configured in config group ) 
 node manager 5     19           77     ( total 96 Vcore configured in config group ) 
 node manager 6     24           72     ( total 96 Vcore configured in config group ) 

and we are expecting that Vcore used on node manager 04 / 05 / 06 will be nearly 96 Core as we already configured in yarn config group , and seems that resource manager not aware about the 96 core on node managers machines

so what is the reason that resource manager not take the most cores from the machine with 96 cores? , is it because some additional tuning that isn’t configured or else?

0

There are 0 best solutions below