https://www.justsoftwaresolutions.co.uk/threading/petersons_lock_with_C++0x_atomics.html
I wrote comments and asked two questions and have another question about Anthony's reply.
here is the reply:
"1. The acquire/release on the flag0 and flag1 variables are necessary to ensure that it acts as a lock: the release store in the unlock synchronizes with the acquire-load in the next lock, to ensure that the data modified while the lock was held is now visible to the second thread."
I have written a peterson lock in C
typedef struct {
volatile bool flag[2];
volatile int victim;
} peterson_lock_t;
void peterson_lock_init(peterson_lock_t &lock) {
lock.flag[0] = lock.flag[1] = false;
lock.victim = 0;
}
void peterson_lock(peterson_lock_t &lock, int id) {
lock.flag[id] = true;
lock.victim = id;
asm volatile ("mfence" : : : "memory");
while (lock.flag[1 - id] && lock.victim == id) {
};
}
void peterson_unlock(peterson_lock_t &lock, int id) {
lock.flag[id] = false;
}
I tested it and I think it's correct, right?
If it's right, my question is do I need to add sfence and lfence to “make sure the data modified while the lock was held is now visible to the second thread” ? like this,
void peterson_lock(peterson_lock_t &lock, int id) {
lock.flag[id] = true;
lock.victim = id;
asm volatile ("mfence" : : : "memory");
asm volatile ("lfence" : : : "memory"); // here, I think this is unnecessary, since mfence will flush load buffer
while (lock.flag[1 - id] && lock.victim == id) {
};
}
void peterson_unlock(peterson_lock_t &lock, int id) {
asm volatile ("sfence" : : : "memory"); // here
lock.flag[id] = false;
}
I think no need to do this. My understanding is that on x86/64 'store' has a release semantics, and 'load' has a acquire semantics(the root reason is on x86/64 there is only store load reorder), and 'lock.flag[id]= false' is a 'store', 'lock.flag[1 - id] ' is a 'load', so there is no need to do things like the acquire/release on the flag0 and flag1 in Dmitriy's implementation
EDIT @Anthony very appreciate your replay. Yes, I need to avoid compiler reorder. So, modification like below, is it correct? Because for x86, only need to forbidden compiler reorder in 'peterson_unlock'
void peterson_lock(peterson_lock_t &lock, int id) {
lock.flag[id] = true;
lock.victim = id;
asm volatile ("mfence" : : : "memory");
while (lock.flag[1 - id] && lock.victim == id) {
};
}
void peterson_unlock(peterson_lock_t &lock, int id) {
asm volatile ("" : : : "memory"); // here, forbidden compiler reorder
lock.flag[id] = false;
}
The use of atomic operations and their memory ordering flags does more than choose the instruction. It also affects the compiler's optimiser.
volatilereads and writes can't be reordered with each other, and must be issued, but can be freely reordered with other code.It is undefined behaviour to access a
volatilenon-atomic variable from multiple threads without synchronization, just as it is for non-volatilenon-atomic variables.Thus
may be reordered to
or
neither of which preserves the lock functionality.
Because a store with the
memory_order_releaseordering ensures that previous writes are visible to a later load withmemory_order_acquireordering, this essentially implies that the compiler cannot reorder prior stores past the unlock, if you use the atomic operations withmemory_order_release.Likewise, because a load with
memory_order_acquireordering ensures that writes from another thread may now be visible when they weren't previously, this essentially implies that the compiler cannot reorder subsequent loads before the lock, if you use the atomic operations withmemory_order_acquire.In short: you need the memory ordering constraints in the lock and unlock, not just for the choice of instruction, but also (and just as importantly) for the effects on the compiler.
For x86, relaxed, acquire and seq_cst loads are all unfenced
movinstructions, but their effects on the compiler are quite different.If you do not require the memory ordering semantics of the atomic operations, use
memory_order_relaxedon all your operations. This will ensure the atomicity of the operation (and avoid undefined behaviour), without adding extraneous ordering requirements. Thus, whenever you would be tempted to use avolatilevariable for synchronization, you should instead use anatomicvariable, with appropriate memory orderings (includingmemory_order_relaxed).You should never need to add additional
asmstatements for synchronization in C++ code. The atomic operations and fence functions are sufficient.Your code is still not correct, as both the
flagvariables andvictimvariable are touched by multiple threads, and are not atomic.