MPI - Equivalent of MPI_SENDRCV with asynchronous functions

714 Views Asked by At

I know that MPI_SENDRECV allow to overcome the problem of deadlocks (when we use the classic MPI_SEND and MPI_RECV functions).

I would like to know if MPI_SENDRECV(sent_to_process_1, receive_from_process_0) is equivalent to:

MPI_ISEND(sent_to_process_1, request1)
MPI_IRECV(receive_from_process_0, request2)
MPI_WAIT(request1)
MPI_WAIT(request2)

with asynchronous MPI_ISEND and MPI_RECV functions?

From I have seen, MPI_ISEND and MPI_RECV creates a fork (i.e. 2 processes). So if I follow this logic, the first call of MPI_ISEND generates 2 processes. One does the communication and the other calls MPI_RECV which forks itself 2 processes.

But once the communication of first MPI_ISEND is finished, does the second process call MPI_IRECV again? With this logic, the above equivalent doesn't seem to be valid...

Maybe I should change to this:

MPI_ISEND(sent_to_process_1, request1)
MPI_WAIT(request1)
MPI_IRECV(receive_from_process_0, request2)
MPI_WAIT(request2)

But I think that it could be create also deadlocks.

Anyone could give to me another solution using MPI_ISEND, MPI_IRECV and MPI_WAIT to get the same behaviour of MPI_SEND_RECV?

2

There are 2 best solutions below

5
NoseKnowsAll On BEST ANSWER

How I usually do this on node i communicating with node i+1:

mpi_isend(send_to_process_iPlus1, requests(1))
mpi_irecv(recv_from_process_iPlus1, requests(2))
...
mpi_waitall(2, requests)

You can see how ordering your commands this way with non-blocking communication allows you (during the ... above) to perform any computation that does not rely on the send/recv buffers to be done during your communication. Overlapping computation with communication is often crucial for maximizing performance.

mpi_send_recv on the other hand (while avoiding any deadlock issues) is still a blocking operation. Thus, your program must remain in that routine during the entire send/recv process.

Final points: you can initialize more than 2 requests and wait on all of them the same way using the above structure as dealing with 2 requests. For instance, it's quite easy to start communication with node i-1 as well and wait on all 4 of the requests. Using mpi_send_recv you must always have a paired send and receive; what if you only want to send?

0
Wesley Bland On

There's some dangerous lines of thought in the question and other answers. When you start a non-blocking MPI operation, the MPI library doesn't create a new process/thread/etc. You're thinking of something more like a parallel region of OpenMP I believe, where new threads/tasks are created to do some work.

In MPI, starting a non-blocking operation is like telling the MPI library that you have some things that you'd like to get done whenever MPI gets a chance to do them. There are lots of equally valid options for when they are actually completed:

  1. It could be that they all get done later when you call a blocking completion function (like MPI_WAIT or MPI_WAITALL). These functions guarantee that when the blocking completion call is done, all of the requests that you passed in as arguments are finished (in your case, the MPI_ISEND and the MPI_IRECV). Regardless of when the operations actually take place (see next few bullets), you as an application can't consider them done until they are actually marked as completed by a function like MPI_WAIT or MPI_TEST.
  2. The operations could get done "in the background" during another MPI operation. For instance, if you do something like the code below:

    MPI_Isend(..., MPI_COMM_WORLD, &req[0]);
    MPI_Irecv(..., MPI_COMM_WORLD, &req[1]);
    MPI_Barrier(MPI_COMM_WORLD);
    MPI_Waitall(2, req);
    

    The MPI_ISEND and the MPI_IRECV would probably actually do the data transfers in the background during the MPI_BARRIER. This is because as an application, you are transferring "control" of your application to the MPI library during the MPI_BARRIER call. This lets the library make progress on any ongoing MPI operation that it wants. Most likely, when the MPI_BARRIER is complete, so are most other things that finished first.

  3. Some MPI libraries allow you to specify that you want a "progress thread". This tells the MPI library to start up another thread (not that thread != process) in the background that will actually do the MPI operations for you while your application continues in the main thread.

Remember that all of these in the end require that you actually call MPI_WAIT or MPI_TEST or some other function like it to ensure that your operation is actually complete, but none of these spawn new threads or processes to do the work for you when you call your nonblocking functions. Those really just act like you stick them on a list of things to do (which in reality, is how most MPI libraries implement them).

The best way to think of how MPI_SENDRECV is implemented is to do two non-blocking calls with one completion function:

MPI_Isend(..., MPI_COMM_WORLD, &req[0]);
MPI_Irecv(..., MPI_COMM_WORLD, &req[1]);
MPI_Waitall(2, req);