I found some multithreaded code in the quite popular LazyCache library, that uses an int[] field as a granular locking mechanism, with the intention to prevent concurrent invocation of a method with the same key as argument. I am highly skeptical about the correctness of this code, because there is no Interlocked or Volatile operation used when exiting the protected region. Here is the important part of the code:
private readonly int[] keyLocks;
public virtual T GetOrAdd<T>(string key, Func<ICacheEntry, T> addItemFactory,
MemoryCacheEntryOptions policy)
{
/* Do stuff */
object cacheItem;
// acquire lock per key
uint hash = (uint)key.GetHashCode() % (uint)keyLocks.Length;
while (Interlocked.CompareExchange(ref keyLocks[hash], 1, 0) == 1) Thread.Yield();
try
{
cacheItem = CacheProvider.GetOrCreate<object>(key, CacheFactory);
}
finally
{
keyLocks[hash] = 0;
}
/* Do more stuff */
}
The protected method call is the CacheProvider.GetOrCreate<object>(key, CacheFactory). It is supposed to be called by one thread at a time, for the same key. For entering the protected region there is while loop that uses the Interlocked.CompareExchange to change a value of the keyLocks array from 0 to 1. So far so good. The part that concerns me is the line that exits the protected region: keyLocks[hash] = 0;. Since there is no barrier there, my understanding is that the C# compiler and the .NET Jitter are free to move instructions in either direction, stepping over this line. So an instruction inside the CacheProvider.GetOrCreate method can be moved after the keyLocks[hash] = 0;.
My question is: according to the specs, does the code above really ensure that the CacheProvider.GetOrCreate will not be called concurrently with the same key? Is the promise of mutual exclusion fulfilled by this code? Or the code is just buggy?
Context: The relevant code was added in the library in this pull request: Optimize cache to lock per key.
Looks buggy to me; the
keyLocks[hash] = 0;is not a release store so parts ofDo stuffcan reorder out of the critical section, potentially becoming visible to another thread only after it acquires the lock.(Potentially reading already-modified data, or more likely having stores appear late and step on stores from the next thread, or not be seen by its loads.)
It will very likely compile to correct asm on x86, where all asm stores have "release" semantics so only compile-time reordering could break things, but not on ARM / AArch64 or other mainstream ISAs that are weakly ordered. So testing on x86 can't reveal this bug unless you actually do get compile-time reordering. (It's still broken, the bug is just dormant.)
https://preshing.com/20121019/this-is-why-they-call-it-a-weakly-ordered-cpu/ demos a spinlock in C++ that uses
relaxedinstead ofacquire/release, and that it breaks in practice on ARM. That example is exactly like this, except here the CAS is like C++memory_order_seq_cstso the top of the critical section is strong enough. But that's not sufficient; stronger ordering for taking the lock doesn't save you from too weak an unlock.A basic spinlock needs an acquire RMW to get exclusive ownership, and a release store to unlock, hence the names. That's sufficient to keep
Do stuffcontained inside the critical section in that direction.In C#, a release store can be done with
Volatile.Write, or via assignment to avolatileobject. My understanding is that those are equivalent to C++foo.store(val, std::memory_order_release).Related x86 asm examples and spinlock discussion:
Interlocked.Exchange, but does need to berelease)Thread.Yield()instead ofSpinWait.SpinOnce(), which might be good if you have more threads than cores and critical sections tend to take a long time to unlock.