Can an acq_rel operation be split into an acquire and a release operation?

105 Views Asked by At

Consider this C++ statement:

foo.exchange(bar, std::memory_order_acq_rel);

Can the above statement is exactly equivalent to any of the below?

1)

foo.exchange(bar, std::memory_order_acquire);
dummy.store(0, std::memory_order_release);
dummy.store(0, std::memory_order_release);
foo.exchange(bar, std::memory_order_acquire);
foo.exchange(bar, std::memory_order_release);
dummy.load(std::memory_order_acquire);
dummy.load(std::memory_order_acquire);
foo.exchange(bar, std::memory_order_release);

In case they are not equivalent, please mention why they are not.

3

There are 3 best solutions below

8
ALX23z On

The operations are completely different for a simple reason. Release operation on variable a is not equivalent in any way to release operation on variable b. To synchronize with the thread one would need to call acquire on variable b rather than a. That's the difference. Yes, the memory instruction are tied to variables.

So replacing acq_rel with lesser instruction on foo and an instruction on dummy will not properly synchornize with threads that call either acquire or release on foo depending on what instruction was called on foo.

Albeit if you called a discarded load on foo in addition to the exchange with the complememting instruction, the effect would be pretty much equivalent. Also you could call a general fence that would trigger a stronger synchronization instruction.

5
Peter Cordes On

For 1) and 2) no, some other thread that loads foo won't sync-with foo.exchange(acquire) in another thread, because it's only an acquire, not a release operation. So that other thread won't safely be able to read the values of non-atomic assignments from before the exchange, or get guaranteed values for earlier atomic stores.

The 3) and 4) have various problems in terms of (not) syncing with another writer or reader to create a happens-before relationship. That only happens when one thread does an acquire-load on the value from a release-store in another thread. If the store side of the exchange is relaxed, that doesn't happen.

IDK if you're thinking of dummy.store(0, std::memory_order_release); as being a 2-way barrier like atomic_thread_fence(release) but it's not, it's just a release operation, on a dummy variable that no other thread ever accesses (I assume.)

See https://preshing.com/20120913/acquire-and-release-semantics/ for a description in terms of local reordering of accesses to coherent shared memory. Acquire and release operations can reorder in one direction each. The dummy release store can reorder with any later operations except ones that are themselves release or stronger, so it might as well not exist.

What would be approximately equivalent (strictly stronger I think) is:

  // Any earlier operations can't reorder past the fence
std::atomic_thread_fence(std::memory_order_release);
  // and later stores can't reorder before the fence
foo.exchange(bar, std::memory_order_acquire);  // so this store is after any earlier ops

The load part of the exchange can still reorder with earlier loads/stores on other objects so it's not much stronger. (related: For purposes of ordering, is atomic read-modify-write one operation or two?)


Also fine would be foo.exchange(bar, release) ; thread_fence(acquire).

Another answer suggests foo.exchange(bar, release) ; foo.load(acquire) would be equivalent, but it's not. The acquire load might sync-with a different thread than the one whose value the exchange saw.

If you're really not using the return value of exchange to either check if you should do something (if(sequence_num > x)), or figure out what or where you should access (e.g. a pointer or array index), the acquire semantics of it is unlikely to matter at all.

But if we consider a reader like int idx = foo.exchange(bar, acq_rel); int tmp = arr[idx];, replacing the acq_rel exchange with int idx = foo.exchange(bar, release) ; foo.load(acquire) (ignoring the value of that acquire load) wouldn't be equivalent. Only an acquire barrier (fence) would order the load side of the exchange wrt. later operations.

If a store from a third thread becomes visible between the exchange(release) and load(acquire), you don't sync-with the thread that stored the value your exchange saw, only the third thread that stored the value you're ignoring.

Consider a writer that did arr[i] = 123; foo.store(i, release);
If a third thread did foo.store(0, relaxed); or whatever, the foo.load(acquire) would sync with it, not the one that wrote arr[idx]. This is of course a contrived example, and dependency ordering would save you on real CPUs even though the load side of foo.exchange was relaxed not consume. But ISO C++ formally guarantees nothing in that case. (And branching on the exchange result instead of using it as part of a load or maybe store address wouldn't let dependency ordering save you.)

If the third thread was also using exchange (even relaxed), that would create a release-sequence so your load would still sync-with the earlier writer as well. But a pure store doesn't guarantee that, breaking a release-sequence.

On most CPUs, where stores can only become visible to other threads by committing to coherent cache, the writer had to wait for exclusive ownership of the cache line just like for an atomic RMW. So plain stores can also continue a release-sequence, letting an acquire load sync-with all previous release stores and RMWs to the object. But ISO C++ doesn't formally guarantee that, and I wouldn't bet on it being safe on PowerPC where store-forwarding between logical cores is a thing. Except that on PPC, an acquire load is done with asm barriers, which would also strengthen the load part of an exchange.

Still, if you're trying to understand the C++ formalism, it's important to understand that the load who's value you actually use needs to be acquire, or there needs to be an acquire fence (not just operation).

0
Nate Eldredge On

Although the C++ memory model does not describe acquire/release semantics in terms of reordering, it's still a pretty good approximation. Acquire operations can be reordered with earlier operations, but not with later; release is the other way around.

It can be helpful visually to try it with cards on a table or something like that. Each card is a load/store/RMW operation, and you start with them in program order. Then the rule is that you may swap any two adjacent cards unless the left one is acquire, or the right one is release, or both.

In what's below, let X be your foo.exchange, which we will decorate as XA or XR according to whether it is acquire or release. Let DA/DR be the dummy acquire-load or release-store. Let P be any relaxed or non-atomic operation that is sequenced before both X and D, and Q another one that is sequenced after.

In the original version, we begin with simply P XAR Q. Since X is both acquire and release, it cannot be swapped with either P or Q. (It is possible for either P or Q to be reordered between the load and store within X, but that's not really relevant here.) So if in some replacement code there is any way to move either P or Q to the opposite side of X, then it is not equivalent to the original.

In #1 it is easy. You start with P XA DR B, but P and XA can be immediately swapped because XA is only acquire.

In #2 it takes a little more. You start with P DR XA Q, and you cannot swap P with DR, nor XA with Q. But you can swap DR with XA, and then P with XA.

P DR XA Q
P XA DR Q
XA P DR Q

I leave #3 and #4 as exercises, as they have similar solutions.