For which (if any?) STORE_ORDER & LOAD_ORDER does C++11 guarantee that this code runs in finite time?
std::atomic<bool> a{false};
std::thread t{[&]{
while(!a.load(LOAD_ORDER));
}};
a.store(true, STORE_ORDER);
t.join();
I see two issues with this:
Memory order
It seems to me that with release & aquire, the compiler and cpu are allowed to reorder my join (assuming it behaves like a load) before the store, which would of course break this.
Even with memory_order_seq_cst, I'm not sure if such a reordering is prohibited because I don't know if join() actually does any loads or stores.
Visibility
If I understood this question about memory_order_relaxed correctly, it is not guaranteed that a store with memory_order_relaxed becomes visible to other threads in a finite amount of time. Is there such a guarantee for any of the other orderings?
I understand that std::atomic is about atomicity and memory ordering, not about visibility. But I am not aware of any other tools in c++11 that could help me here. Would I need to use a platform-specific tool to get a correctness guarantee here and if yes, which one?
To take this one step further – if I have finiteness, it would be nice to also have some promise about speed. I don't think the C++ standard makes any such promises. But is there any compiler or x86-specific way to get a promise that the store will become visible to the other thread quickly?
In summary: I'm looking for a way to swiftly stop a worker thread that is actually guaranteed to have this property. Ideally this would be platform-independent. But if we can't have that, does it at least exist for x86?
After some more searching, I found a question that is identical to the visibility part of mine, which got a clear answer: There is indeed no such guarantee – there is only the request that "implementations should make atomic stores visible to atomic loads within a reasonable amount of time". The standard does not define what it means by should, but I will assume the normal meaning, so this would be non-binding. It also not quite clear what "reasonable" means, but I would assume it clearly excludes "infinite".
This doesn't quite answer the question about memory ordering. But if the store is ordered after the
join(), which may block forever, the store would never become visible to the other threads – which would not be a "reasonable amount of time".So while the standard does not require the code in the question to be valid, it at least suggests that it should be valid. As a bonus, it actually says that it shouldn't just be finite time, but also somewhat fast (or well, reasonable).
That leaves the part of my question about a platform-specific solution: Is there a x86-specific way to write the requested algorithm so it is actually guaranteed to be correct?