How to get reliable results on SLURM for performance tests?

115 Views Asked by At

I want to benchmark two algorithms ALG1 and ALG2 against each other using SLURM. Both binaries ALG1.o and ALG2.o contain serial implementations of the algorithms. In now, that ALG1.o is always faster than ALG2.o. Since I have thousands of experiments I want to use our cluster to run them. Each node on the cluster has 128 CPUS and 1TB of memory. But there are only few such nodes.

What I would like to achieve is to send my jobs to the cluster and by doing so specify that each jobs gets a CPU and a certain amount of memory for itself. For the duration of the algorithm these resources should be only used by my job.

First question: Can I expect reliable running times using this approach? Currently I'm not getting reliable results since sometimes ALG2.o is faster than ALG1.o which is never the case when I run the very same instances on different local machines.

Second question: The script I'm using to send the jobs using the sbatch command is as follows:

#!/usr/bin/bash
#SBATCH --ntasks 128               #I want possibly reserve a whole node for myself.
#SBATCH --mem-per-cpu=4G           #Example
#SBATCH -o ALG1.%J                 #Job output
#SBATCH -t 100:00:00               #Max wall time for entire job

#SBATCH -p myPartition

##SBATCH --exclusive               # <-- This does not work on my cluster!!
#SBATCH --constraint="Xeon&Gold6338"  #To access only nodes on the cluster with the same specs. For sure, every node with this CPU has the same specs.

# Define srun arguments:
srun="srun -n1 -N1 --exclusive --time=05:00:00"

parallel -N 1 --delay .2 -j $SLURM_NTASKS -a /pathto/myFileContainingTheInstances.txt eval "$srun ./ALG1.o {}"

Why could it be that I'm getting unreliable running times when using it? What 'parallel' does is just to read the file myFileContainingTheInstances.txt line by line and pass every line of the file as a list of arguments to ALG1.o. Then, I send the same script for ALG2.o.

0

There are 0 best solutions below