I am running the WRF-ARW code that seems to have the best performances when compiled and run in hybrid MPI-OpenMP execution.
I am working on a Lenovo cluster with 36 cores per node, managed by the LSF 10.2 scheduler.
I read that the best performances are ackieved with 4 OpenMP threads per each MPI precess.
Here follows my directive:
#BSUB -J WRF_run
#BSUB -n 224
#BSUB -x
#BSUB -R "span[ptile=9]"
#BSUB -q my_queue
export I_MPI_HYDRA_BOOTSTRAP=lsf
export I_MPI_HYDRA_BRANCH_COUNT=25
export I_MPI_HYDRA_COLLECTIVE_LAUNCH=1
export OMP_NUM_THREADS=4
meaning that 224 MPI processes will be splitted in blocks of 9 MPI processes on each node, and will generate 9*4=36 threads in each node.
It is not clear to me whether is correct to ask for:
#BSUB -n 224
or to ask for the total number of threads
#BSUB -n 896 (224*4)
Thank you