I've been falling in love with the ease-of-use of Fortran's Coarrays framework, because of how clean it is compared to lower level APIs like MPI.
But one thing I haven't been able to tease out is whether there is a way to know how to explicitly tell Fortran to perform puts and gets asynchronously. The benefit of this would be to replicate MPI's MPI_I* call, which allow overlapping communication and computation.
The reason why I'm interested in overlapping is for performance reasons. The particular application I have in mind is in CFD with particle methods, where the domain is subdivided and halo particles are exchanged every time-step. Using MPI p2p calls, which I'm currently more familiar with, I'm initiating the exchange of particle information between processes and then performing computation while the communications are completing, kind of like:
do pid = 0, numprocs-1
if (pid /= procid) then
! post sends
call MPI_ISEND(neighbours(pid+1)%sendbuff, &
neighbours(pid+1)%n_send, &
particle_derived_type, &
pid, &
0, &
MPI_COMM_WORLD, &
request(pid+1), &
ierr)
! post receives
call MPI_IRECV(neighbours(pid+1)%recvbuff, &
neighbours(pid+1)%n_recv, &
particle_derived_type, &
pid+1, &
0, &
MPI_COMM_WORLD, &
request(numprocs+pid+1), &
ierr)
end if
end do
! do some heavy computation
call MPI_WAITALL(2*numprocs, request, status, ierr)
This is just for demonstration. In reality, each process would only communicate information with its neighbour processes and not all of them. The advantage of using MPI_ISEND/RECV here is that I don't have to worry about locking and that I can so some computation while the sends and receives are being completed.
A kind of equivalent example using Coarrays:
do pid = 1, numprocs
if (i /= this_image()) then
! put data into remote neighbour images
n_send = neighbours(pid)%n_send
neighbours(this_image())[pid]%recv_buff(1:n_send) = neighbours(pid)%send_buff(1:n_send)
end if
end do
! do some heavy computation
sync all
which is cool, because it's much more compact. But I'm not sure whether the "puts" return after initiating the transfer like with MPI_ISEND/RECV.
So for this example, I'm interested in replicating MPI_I* ability to overlap communications with computation in Fortran Coarrays, as it is pretty important in optimising the performance of CFD simulations.
EDIT Hopefully clearer explanation of why I want to overlap comms with comps.
The coarray communication model is one of remote memory access/one-sided communication, not one of point-to-point.
In the assignment statement in
iis set to3, "immediately". The reference in the print statement happens similarly.One doesn't question whether the "put" and "get" happen with blocking, synchronously or asynchronously.
Consider now
In the first example the processor may decide against storing to/reading from a permanent memory location for the value of
i. In the second example, the value ofimust be fetched rather than assumed.When there is more than one image involved we see similar:1
Here, image 1 has two assignments, setting the value of
ion each of the two images. Both happen "immediately".As soon as
i=1is executed, the value ofion the first image is1. As soon asi[2]=3is executed the value ofion the second image is3.Now, "blocking" in this second assignment (in particular) comes down to what it means for the assignment to complete.
There are two extremes of conversations that may be had:
can be compared with
The Fortran standard does not say which of those conversations happens, but it leaves it open for the second to be the one that does. This second conversation does not even have to happen around the time of the assignment statement (Fortran 2018, 11.6.2 Note 4):2
That's all to say that the assignment could be blocking in some way but there's no requirement or incentive for it to be. Importantly, deadlock cannot occur in assignment even if all images are trying to assign to coarrays on all other images.
One-sided communication like this works by placing restrictions on interactions between communication partners.
Loosely, if one image sets the value of a (non-volatile) coarray, no other image is allowed to define or reference that coarray until some synchronization has happened.3 The image that sets the value is assured that until this synchronization, the coarray is exactly as this image has decided it to be.
Communication can happen exactly at the time of the assignment, or at some later time; computation can stall until the value has been transmitted, even acknowledged, or continue immediately. Fortran doesn't tell us this but a processor is allowed to defer or even eliminate communication, or overlap communication and computation. Compiler vendors are usually keen to have optimal behaviour.
The program
is not valid. The restrictions which make this type of program invalid allow for a range of implementation approaches. As a programmer you have little say in or knowledge of which implementation is used.
I've said nothing about atomic actions (including events) or collective actions.
1 Examples that follow assume two images.
2 If Image 1 makes several assignments to
i[2], the image may choose to give just the final value to Image 2 in one conversation. Indeed, the second conversation can be avoided completely in some cases.3 It's this restriction which allows us to say that the value of a coarray is affected immediately: a valid Fortran program cannot have a conflict about what the value should be.