peterson algorithm about data modification in lock visible to second thread with mfence/lfence/sfence

232 Views Asked by At

https://www.justsoftwaresolutions.co.uk/threading/petersons_lock_with_C++0x_atomics.html I wrote comments and asked two questions and have another question about Anthony's reply.
here is the reply:

"1. The acquire/release on the flag0 and flag1 variables are necessary to ensure that it acts as a lock: the release store in the unlock synchronizes with the acquire-load in the next lock, to ensure that the data modified while the lock was held is now visible to the second thread."

I have written a peterson lock in C

typedef struct {
  volatile bool flag[2];
   volatile int victim;
  } peterson_lock_t;

  void peterson_lock_init(peterson_lock_t &lock) {
   lock.flag[0] = lock.flag[1] = false;
   lock.victim = 0;
  } 

  void peterson_lock(peterson_lock_t &lock, int id) {
   lock.flag[id] = true;
   lock.victim = id;
   asm volatile ("mfence" : : : "memory");
   while (lock.flag[1 - id] && lock.victim == id) {
   };
  }

  void peterson_unlock(peterson_lock_t &lock, int id) {
   lock.flag[id] = false;
  }

I tested it and I think it's correct, right?

If it's right, my question is do I need to add sfence and lfence to “make sure the data modified while the lock was held is now visible to the second thread” ? like this,

  void peterson_lock(peterson_lock_t &lock, int id) {
   lock.flag[id] = true;
   lock.victim = id;
   asm volatile ("mfence" : : : "memory");
   asm volatile ("lfence" : : : "memory"); // here, I think this is unnecessary, since mfence will flush load buffer
   while (lock.flag[1 - id] && lock.victim == id) {
   };
  }

  void peterson_unlock(peterson_lock_t &lock, int id) {
   asm volatile ("sfence" : : : "memory"); // here
   lock.flag[id] = false;
  }

I think no need to do this. My understanding is that on x86/64 'store' has a release semantics, and 'load' has a acquire semantics(the root reason is on x86/64 there is only store load reorder), and 'lock.flag[id]= false' is a 'store', 'lock.flag[1 - id] ' is a 'load', so there is no need to do things like the acquire/release on the flag0 and flag1 in Dmitriy's implementation

EDIT @Anthony very appreciate your replay. Yes, I need to avoid compiler reorder. So, modification like below, is it correct? Because for x86, only need to forbidden compiler reorder in 'peterson_unlock'

void peterson_lock(peterson_lock_t &lock, int id) {
    lock.flag[id] = true;
    lock.victim = id;
    asm volatile ("mfence" : : : "memory");
    while (lock.flag[1 - id] && lock.victim == id) {
    };
}

void peterson_unlock(peterson_lock_t &lock, int id) {
    asm volatile ("" : : : "memory"); // here, forbidden compiler reorder
    lock.flag[id] = false;
}
1

There are 1 best solutions below

1
Anthony Williams On

The use of atomic operations and their memory ordering flags does more than choose the instruction. It also affects the compiler's optimiser.

volatile reads and writes can't be reordered with each other, and must be issued, but can be freely reordered with other code.

It is undefined behaviour to access a volatile non-atomic variable from multiple threads without synchronization, just as it is for non-volatile non-atomic variables.

Thus

int a;
peterson_lock(some_lock,0);
a=42;
peterson_unlock(some_lock,0);

may be reordered to

int a;
a=42;
peterson_lock(some_lock,0);
peterson_unlock(some_lock,0);

or

int a;
peterson_lock(some_lock,0);
peterson_unlock(some_lock,0);
a=42;

neither of which preserves the lock functionality.

Because a store with the memory_order_release ordering ensures that previous writes are visible to a later load with memory_order_acquire ordering, this essentially implies that the compiler cannot reorder prior stores past the unlock, if you use the atomic operations with memory_order_release.

Likewise, because a load with memory_order_acquire ordering ensures that writes from another thread may now be visible when they weren't previously, this essentially implies that the compiler cannot reorder subsequent loads before the lock, if you use the atomic operations with memory_order_acquire.

In short: you need the memory ordering constraints in the lock and unlock, not just for the choice of instruction, but also (and just as importantly) for the effects on the compiler.

For x86, relaxed, acquire and seq_cst loads are all unfenced mov instructions, but their effects on the compiler are quite different.

If you do not require the memory ordering semantics of the atomic operations, use memory_order_relaxed on all your operations. This will ensure the atomicity of the operation (and avoid undefined behaviour), without adding extraneous ordering requirements. Thus, whenever you would be tempted to use a volatile variable for synchronization, you should instead use an atomic variable, with appropriate memory orderings (including memory_order_relaxed).

You should never need to add additional asm statements for synchronization in C++ code. The atomic operations and fence functions are sufficient.

Your code is still not correct, as both the flag variables and victim variable are touched by multiple threads, and are not atomic.