Is/Will there be a way to perform Fortran Coarray calls asynchronously?

Question

Is/Will there be a way to perform Fortran Coarray calls asynchronously?

127 Views Asked by Edward Yang At 30 March 2023 at 22:20

I've been falling in love with the ease-of-use of Fortran's Coarrays framework, because of how clean it is compared to lower level APIs like MPI.

But one thing I haven't been able to tease out is whether there is a way to know how to explicitly tell Fortran to perform puts and gets asynchronously. The benefit of this would be to replicate MPI's MPI_I* call, which allow overlapping communication and computation.

The reason why I'm interested in overlapping is for performance reasons. The particular application I have in mind is in CFD with particle methods, where the domain is subdivided and halo particles are exchanged every time-step. Using MPI p2p calls, which I'm currently more familiar with, I'm initiating the exchange of particle information between processes and then performing computation while the communications are completing, kind of like:

do pid = 0, numprocs-1

   if (pid /= procid) then
      ! post sends
      call MPI_ISEND(neighbours(pid+1)%sendbuff, &
                     neighbours(pid+1)%n_send, &
                     particle_derived_type, &
                     pid, &
                     0, &
                     MPI_COMM_WORLD, &
                     request(pid+1), &
                     ierr)

      ! post receives
      call MPI_IRECV(neighbours(pid+1)%recvbuff, &
                     neighbours(pid+1)%n_recv, &
                     particle_derived_type, &
                     pid+1, &
                     0, &
                     MPI_COMM_WORLD, &
                     request(numprocs+pid+1), &
                     ierr)
   end if
end do

! do some heavy computation

call MPI_WAITALL(2*numprocs, request, status, ierr)

This is just for demonstration. In reality, each process would only communicate information with its neighbour processes and not all of them. The advantage of using MPI_ISEND/RECV here is that I don't have to worry about locking and that I can so some computation while the sends and receives are being completed.

A kind of equivalent example using Coarrays:

do pid = 1, numprocs

   if (i /= this_image()) then
      ! put data into remote neighbour images
      n_send = neighbours(pid)%n_send
      neighbours(this_image())[pid]%recv_buff(1:n_send) = neighbours(pid)%send_buff(1:n_send)
   end if

end do

! do some heavy computation

sync all

which is cool, because it's much more compact. But I'm not sure whether the "puts" return after initiating the transfer like with MPI_ISEND/RECV.

So for this example, I'm interested in replicating MPI_I* ability to overlap communications with computation in Fortran Coarrays, as it is pretty important in optimising the performance of CFD simulations.

EDIT Hopefully clearer explanation of why I want to overlap comms with comps.

Original Q&A

There are 1 best solutions below

**francescalus** · Answer 1 · 2023-04-06T17:21:27.177000

The coarray communication model is one of remote memory access/one-sided communication, not one of point-to-point.

In the assignment statement in

integer i
i = 3
print *, i

end program

i is set to 3, "immediately". The reference in the print statement happens similarly.

One doesn't question whether the "put" and "get" happen with blocking, synchronously or asynchronously.

Consider now

integer, volatile :: i
i = 3
print *, i

end program

In the first example the processor may decide against storing to/reading from a permanent memory location for the value of i. In the second example, the value of i must be fetched rather than assumed.

When there is more than one image involved we see similar:¹

integer i[*]

if (this_image()==1) then
  i = 1
  i[2] = 3
end if

sync all

print *, i

end program

Here, image 1 has two assignments, setting the value of i on each of the two images. Both happen "immediately".

As soon as i=1 is executed, the value of i on the first image is 1. As soon as i[2]=3 is executed the value of i on the second image is 3.

Now, "blocking" in this second assignment (in particular) comes down to what it means for the assignment to complete.

There are two extremes of conversations that may be had:

Image 1: Hey, Image 2, you there?

Image 2: Sup?

Image 1: I'd like to set your value of i to be equal to 3.

Image 2: I'll get on it right after I've finished what I'm doing.

Image 1: No worries, I'll grab a coffee while you do it.

... time passes

Image 2: Wotcha, Image 1.

Image 1: Hello?

Image 2: I've done what you wanted. My i is now equal to 3.

Image 1: Great, thanks. I'll get back to work.

can be compared with

Image 1: Image 2, your value of i is now 3.

The Fortran standard does not say which of those conversations happens, but it leaves it open for the second to be the one that does. This second conversation does not even have to happen around the time of the assignment statement (Fortran 2018, 11.6.2 Note 4):²

In practice [..] the processor could make a copy of a nonvolatile coarray on an image [..] and, as an optimization, defer copying a changed value back to the permanent memory location while it is still being used. Since the variable is not volatile, it is safe to defer this transfer [..]

That's all to say that the assignment could be blocking in some way but there's no requirement or incentive for it to be. Importantly, deadlock cannot occur in assignment even if all images are trying to assign to coarrays on all other images.

One-sided communication like this works by placing restrictions on interactions between communication partners.

Loosely, if one image sets the value of a (non-volatile) coarray, no other image is allowed to define or reference that coarray until some synchronization has happened.³ The image that sets the value is assured that until this synchronization, the coarray is exactly as this image has decided it to be.

Communication can happen exactly at the time of the assignment, or at some later time; computation can stall until the value has been transmitted, even acknowledged, or continue immediately. Fortran doesn't tell us this but a processor is allowed to defer or even eliminate communication, or overlap communication and computation. Compiler vendors are usually keen to have optimal behaviour.

The program

integer i[*]
i[2] = this_image()
print *, i[2]

end program

is not valid. The restrictions which make this type of program invalid allow for a range of implementation approaches. As a programmer you have little say in or knowledge of which implementation is used.

I've said nothing about atomic actions (including events) or collective actions.

¹ Examples that follow assume two images.

² If Image 1 makes several assignments to i[2], the image may choose to give just the final value to Image 2 in one conversation. Indeed, the second conversation can be avoided completely in some cases.

³ It's this restriction which allows us to say that the value of a coarray is affected immediately: a valid Fortran program cannot have a conflict about what the value should be.

Is/Will there be a way to perform Fortran Coarray calls asynchronously?

There are 1 best solutions below

Related Questions in PARALLEL-PROCESSING

Related Questions in FORTRAN

Related Questions in FORTRAN-COARRAYS

Trending Questions

Popular # Hahtags

Popular Questions