In Rust Atomics and Locks, more or less the following code is suggested for appropriately implementing the drop trait of a simplified Arc: (code is mine)
unsafe {
if 1 == (*self.inner).strong.fetch_sub(1, Release) {
fence(Acquire);
let box_ = Box::rom_raw(self.inner);
drop(box_);
}
}
where (*self.inner).strong is an AtomicUsize.
So here is my question:
I understand that an acquire will establish a "happens-before/synchronize-with" relationship with the previous release (with the thread where the loaded value was written). In order to properly drop the value, we need each Arc to have their fetch_sub linked one after the other.
I don't understand how this is guaranteed here. I would understand if we acquired the atomic's value, subbed one, and then released, all in a single operation (basically fetch_sub with AcqRel ordering or a compare and swap loop).
Here, we only "acquire" when we load 1. My understanding of using Release on fetch_sub is that the fetch part is relaxed. Based on this understanding, I have two issues with the aforementioned code:
1
Wouldn't this case be possible:
Thread A:
load: 2 (t = 1)
write 1 (t = ?)
Thread B:
load: 2 (t = 1)
write 1 (t = ?)
and thus the value was never dropped? I fail to see how the load has a guarantee that it has exclusive access over the value, implying it will always load in order.
2
Am I not establishing only a "happens-before" relationship between the reading of 1 and the drop? I think this is fine, since I can only drop if I indeed see a 1, but how am I guaranteed to see that 1 and not have a data race on the atomic?
I suspect my misunderstanding comes from the guarantees provided by atomics/ordering, but
Atomic read-modify-write instructions are always atomic, regardless of the memory ordering chosen. So if let's say
self.innerhas the value 5, and then you dofetch_sub(1, Relaxed)in any combination of different threads, you are still guaranteed that exactly one of them will load the value 2 and store the value 1.Rust doesn't have a formally defined memory model, but AFAIK in practice it follows C++, which specifies this behavior by saying that "an atomic read-modify-write always loads the value that, in the modification order of the object, immediately precedes the value it stored." https://timsong-cpp.github.io/cppwp/n4868/atomics.order#10
You do have a happens-before relationship between the load of 1 and the drop, but that is trivial; for operations within a single thread, happens-before always follows program order. Confusingly, this does not mean that they cannot be reordered for execution!
What you really care about is that all prior accesses to
box_, from other threads, must happen-before the drop. That's the condition that avoids a data race. And this you do get, thanks to the use of acquire and release ordering.It would be a little simpler to see why if the
fetch_subusedAcqRelordering. In that case, we would have:the last prior access happens-before the store of the value 1 (because it precedes that
fetch_subin program order within a single thread)the store of 1 synchronizes with the load of 1, because the former is release, the latter is acquire, and the value observed by the load was the value put there by the store
the load of 1 happens-before the drop (program order)
happens-before is transitive.
You can argue similarly for all other accesses to the box.
But using
AcqRelis somewhat inefficient, because we are paying for acquire ordering on everyfetch_sub, but we really only need it for the last one (the one which loads the value 1). So in this code, we instead useReleaseordering only on thefetch_sub, together with anAcquirefence which is only executed conditionally when the value loaded equals 1. The fence can be thought of as "retroactively" promoting the relaxed load of thefetch_subto acquire. The formal proof takes a little more work, but it all does work out.There's another side question: with
Releaseordering only, you no longer have a full happens-before chain involving all thefetch_subs, so how do you guarantee that thedrophappens-after the second-to-last access? The C++ memory model handles this with the concept of a "release sequence". Normally, a load can only synchronize with a store if it loads the value put there by the store. But release sequences mean it also "counts" if you load a value that was produced by some sequence of atomic read-modify-writes acting on the value originally stored. So your drop not only happens-after the store of the value 1, it also happens-after a hypothetical store of the value 47 (since we could only have reached 1 via some long chain of atomicfetch_addandfetch_sub).