// Thread 1:
x.store(1, std::memory_order_seq_cst); // A
y.store(1, std::memory_order_release); // B
// Thread 2:
r1 = y.fetch_add(1, std::memory_order_seq_cst); // C
r2 = y.load(std::memory_order_relaxed); // D
// Thread 3:
y.store(3, std::memory_order_seq_cst); // E
r3 = x.load(std::memory_order_seq_cst); // F
is allowed to produce r1 == 1 && r2 == 3 && r3 == 0, where A happens-before C, but C precedes A in the single total order C-E-F-A of memory_order_seq_cst (see Lahav et al).
Because I am a novice, I have been confused when looking at the memory model recently. I would like to ask how to treat this sentence "A happens-before C"? Shouldn't we generally think that the happens-before between different threads should have a while loop to do synchronization? Does the context here mean that if C happens to be loaded to the value stored in A (but it may not be loaded, because it is not a while loop)? In addition, how to understand A happens-before. C, but C precedes A in single total order. From the perspective of Thread B, A happens-before C, then the result of A execution C is visible, why does C precedes A appear?
The following contents use many levels of markdown item lists which may be difficult to view on the mobile devices.
Recently I also read the same cppreference section and has the same question as you at first and got the idea after viewing related SO QAs and papers. I hope this answer can also help you understand what the cppreference says.
Your question seems to be a duplicate of QA_1 and QA_2, but both QAs seem to not read the original paper_1 which is referenced in the cppreference_1 and paper_2. So here I will give one answer mainly based on the paper_1 and partly based on its reference paper_3 from perspective of mathematics which may help grasp the inner ideas if having mathematics basic knowledge.
These are based on my understanding of the papers and related links. Please point out errors if any, thanks in advance.
Short answer:
And the QA_1 answer says how this can occur in the real world (can be due to the cache consistency which cause different threads see different order of different variables).
S(k, m)B synchronized with (sw) C about the variable
ybut not aboutx, so A happens before C. (This is definition of "happens before": a sequence composed of sb and sw.)S(m, o, p, k)(here I coalesced differentS(,)by order).S(p,k)is due to "reads-before" (IMO, this means "reads-before-write" due toS(k, m)andS(m, o, p, k)where the former stays "happens-before" and the latter is the new total modification order.(A)-B-C-E-D-F-A(here "(A)" means it runs before but observed later.)Notice: maybe the above optimized memory model introduced in c++20 is still flawed. However, I'm not one compiler/computer architecture expert, so it's beyond my abilities to find the flaws.
TL;DR (search for something you don't understand after reading the paper_1) Detailed answer mainly based on the paper_1:
The following needs some knowledge of discrete mathematics and I add some description for someone temporarily not familiar with them.
If you don't want to be stuck with the "math", then view the following "non-math" part is enough.
Notice: here only some symbols are rephrased, you may better view the original papers if has some questions about some terminology.
math
Part of the following primitive symbol definitions (mainly about relation) can be seen from "Notation 1" in the paper_1 like and "Definition 8" in the like
(;)paper_3 which is referenced in the paper_1.Here I assume that they take same math primitive symbols in their the papers because paper_1 "Remark 1" says:
And after viewing the paper_3 footnotes in p5, it is mainly based on the ISO standard and maps them to the pure math which may be more intuitive if having better mathematics knowledge.
xmeaning.)Here
;is composition of relations(i.e.[A] -> R -> [B]).This means one AB pair has the relation R.
happens-before definition:
non-math
See this (better than cppreference ones because of the 2 levels) which is referenced in both above 2 QAs.
This implies including consume.
As the paper_1 says:
So the above cppreference definition may be more general than the following math definitions from the paper_1.
math
%5E+)
+means transitive (i.e. sequence) andUmeans "or".above
S(k,m)has the "happens-before" relation because of theS(l,m)synchronization(This is due tol,mhave data dependency implied (i.e. m read the write of l))swnon-math
think as the
rel,acqsequence like the other section of cppreference_1 says.math (see the paper_1 for more details)
%5E?;rs;rf;%5BR%5E%7B%5Csqsupseteq%20rlx%7D%5D;(sb;%5BF%5D)%5E?;%5BE%5E%7B%5Csqsupseteq%20acq%7D%5D)
?implies "or" relation.scfrom the bottom-right figure in p7 of the paper_1.[F]is always explicity placed by weak architecture like POWERPC referenced in the paper_1 which is related withreloracq,etc (see this link from the paper_2 for how the compiler adds these fences).rs; rfis implied byrelandacqrsin the definition of
rs,rf; rmwis to get the sequence like "write,read,write".relandacq.definition of
sb. Here I take the definition in the paper_3 to highlight that they are in the same thread.non-math
From cppreference_2:
math
%5E2;sb%5Csubseteq%20thd)
Then
sbmeans:and the
thd:I:definition of the single total order (
Sin the paper_1)non-math
as the paper_1 says:
math from the paper_3
idis "identity relation"Sorder is runtime defined.acy(S)(S is acyclic) is significant because this caused the above example failure.relations of "the single total order" and "happens-before" and reasons for changes
original version in c++11 from the paper_1:
non-math:
Sneeds to be conform to the hb (happens-before) ordermath: see the above equation.
why the original c++11 model fails with specific examples?
Because it drops of
syncfences. Then it implies the weaker memory model.quotes in the paper_1 "Fixing the Model":
And see paper_2:
(the above references
isyncwhich probably used at the end to ensure the right context and avoid "intra-process reordering" by refetching. I never programmed POWERPC, so I only offer the related reference links and it's not from my experiences).The above 2 quotes mean same about how drops of synchronization fences occurs. (i.e.
hwsyncby "Store Seq Cst" byscis avoided and onlylwsyncexists)from the paper_1:
non-math
See above "[F]" about the sync fence.
sb(because this won't cause cycle by the program order) are allowed now in c++20 instead of the wholehb(happens-before).math
hb->sb...(i.e. the total modification orderSdoesn't take the wholehbin account now in c++20)After changing, the example pattern
sb;hbis dropped. (view Figure 3:S(k,l)issb) (Specifically to say,hb=sw). So the old happens-beforeA,Cis not take in account now in the total modification order.