MPI Nested Loop - Division of Iterations Among Threads

35 Views Asked by At

I currently have this (poorly) written MPI code, that for brevity, just needs to iterate over two variables i and l. The way I currently have it done doesn't take advantage of the symmetries of the systems I am iterating over. The MPI parallelization is on the i variable. I just split up the values i can take on among the different processes.

int iter_per_process = n_states / size;
int remainder = n_states % size;
int iter_start = rank * iter_per_process;
int iter_end = iter_start + iter_per_process;

MPI_Barrier(MPI_COMM_WORLD);

for(i = iter_start; i < iter_end; i++){
    for(l=0; l<n_states; l++){

So, for example, if n_states = 10, and I had 5 MPI processes (yes I am aware that n_states needs to be divisible by the number of processes for this method to work, I'm in the processes of getting rid of this dependence). The first thread would take i=0,1 and all values of l. The next thread would take i = 2,3 and all values of l and so on and so forth all the way to i=8,9 for the last thread.

It turns out that in this system we don't need to iterate over all (i, l) because the calculation (l, i) is effectively the same and will save us time. To take advantage of this symmetry I can change the bounds with which I am iterating i and l through.

for(i = iter_start; i < iter_end; i++){
    for(l=i; l<n_states; l++){

This will inevitably cause issues with different threads having different workloads and will not provide any speedup as the main MPI process will still have the same workload as it did before.

I was thinking I might be able to split up the number of loop iterations each thread is required to run through so that each thread has a balanced workload. I am unsure with how to actually go about splitting it up so that I can maintain the nested loop (quite important to keep this as it helps with indexing later on).

I am mainly worried about providing unique (i, l) to avoid double calculations and incorrect indexing down the line. Any pointers would be appreciated!

0

There are 0 best solutions below