x86 sequential consistency with a mfence

31 Views Asked by At

I'm trying to understand how a mfence guarantees sequential consistency on x86.

Take this code for example

std::atomic<int> a,b,r;

void write_a()
{
    a.store(1, std::memory_order_seq_cst);    
}

void write_b()
{
    b.store(1, std::memory_order_seq_cst);    
}

void read_a_b()
{
    while(!a.load(std::memory_order_seq_cst));
    if(b.load(std::memory_order_seq_cst)) {
        r++;
    }
}

void read_b_a()
{
    while(!b.load(std::memory_order_seq_cst));
    if(a.load(std::memory_order_seq_cst)) {
        r++;
    }
}

gcc 9.5 with -O3 generates following assembly for the write_a and write_b functions

write_a():
        mov     DWORD PTR a[rip], 1
        mfence
        ret
write_b():
        mov     DWORD PTR b[rip], 1
        mfence
        ret

When a std::memory_order_release stores are used, then the code becomes

write_a():
        mov     DWORD PTR a[rip], 1
        ret
write_b():
        mov     DWORD PTR b[rip], 1
        ret

so essentially the mfence is just dropped.

Now to my understanding with sequential-consistency a result r==0 is impossible. Whereas with acquire-release ordering it is theoretically possible.

As far as I know, mfence makes sure that the store buffer gets "flushed" and every store and load that follows it is stalled until the memory operations before the mfence have completed globally. However in my example, after the mfence no other memory operation follow, so I don't understand how it makes any difference in regards to the sequential consistent visibility of the changes to a and b.

In particular what happens if thread1 executed mov DWORD PTR a[rip], 1 but has not yet started executing mfence and thread2 analogously executed mov DWORD PTR b[rip], 1 but not yet started executing mfence.

write_a():
        mov     DWORD PTR a[rip], 1 <- thread1 finished this operation
        mfence                      <- thread1 has not yet executed this operation
        ret
write_b():
        mov     DWORD PTR b[rip], 1 <- thread2 finished this operation
        mfence                      <- thread2 has not yet executed this operation
        ret

At this time point, the code executed so far is the same as the code generated for the std::memory_order_release stores. So up to this point only a "release" stores took place.

Now if we have thread3 executing read_a_b() and thread4 executing read_b_a() I believe they could still disagree upon the order of the writes to a and b, so a result of r==0 is still theoretically possible. Only after thread1 and thread2 execute their respective mfences, this would no longer be possible.

What am I getting wrong?

I know that gcc10 uses xchng instead of mov, mfence but my underlying problem remains the same

0

There are 0 best solutions below