So I researched the topic for quite some time now, and I think I understand the most important concepts like the release and acquire memory fences.
However, I haven't found a satisfactory explanation for the relation between volatile and the caching of the main memory.
So, I understand that every read and write to/from a volatile field enforces strict ordering of the read as well as the write operations that precede and follow it (read-acquire and write-release). But that only guarantees the ordering of the operations. It doesn't say anything about the time these changes are visible to other threads/processors. In particular, this depends on the time the cache is flushed (if at all). I remember having read a comment from Eric Lippert saying something along the lines of "the presence of volatile fields automatically disables cache optimizations". But I'm not sure what exactly this means. Does it mean caching is completely disabled for the whole program just because we have a single volatile field somewhere? If not, what is the granularity the cache is disabled for?
Also, I read something about strong and weak volatile semantics and that C# follows the strong semantics where every write will always go straight to main memory no matter if it's a volatile field or not. I am very confused about all of this.
I'll address the last question first. Microsoft's .NET implementation has release semantics on writes1. It's not C# per se, so the same program, no matter the language, in a different implementation can have weak non-volatile writes.
The visibility of side-effects is regarding multiple threads. Forget about CPUs, cores and caches. Imagine, instead, that each thread has a snapshot of what is on the heap that requires some sort of synchronization to communicate side-effects between threads.
So, what does C# say? The C# language specification (newer draft) says fundamentally the same as the Common Language Infrastructure standard (CLI; ECMA-335 and ISO/IEC 23271) with some differences. I'll talk about them later on.
So, what does the CLI say? That only volatile operations are visible side-effects.
Note that it also says that non-volatile operations on the heap are side-effects as well, but not guaranteed to be visible. Just as important2, it doesn't state they're guaranteed to not be visible either.
What exactly happens on volatile operations? A volatile read has acquire semantics, it precedes any following memory reference. A volatile write has release semantics, it follows any preceding memory reference.
Acquiring a lock performs a volatile read, and releasing a lock performs a volatile write.
Interlockedoperations have acquire and release semantics.There's another important term to learn, which is atomicity.
Reads and writes, volatile or not, are guaranteed to be atomic on primitive values up to 32 bits on 32-bit architectures and up to 64 bits on 64-bit architectures. They're also guaranteed to be atomic for references. For other types, such as long
structs, the operations are not atomic, they may require multiple, independent memory accesses.However, even with volatile semantics, read-modify-write operations, such as
v += 1or the equivalent++v(orv++, in terms of side-effects) , are not atomic.Interlocked operations guarantee atomicity for certain operations, typically addition, subtraction and compare-and-swap (CAS), i.e. write some value if and only if the current value is still some expected value. .NET also has an atomic
Read(ref long)method for integers of 64 bits which works even in 32-bit architectures.I'll keep referring to acquire semantics as volatile reads and release semantics as volatile writes, and either or both as volatile operations.
What does this all mean in terms of order?
That a volatile read is a point before which no memory references may cross, and a volatile write is a point after which no memory references may cross, both at the language level and at the machine level.
That non-volatile operations may cross to after following volatile reads if there are no volatile writes in between, and cross to before preceding volatile writes if there are no volatile reads in between.
That volatile operations within a thread are sequential and may not be reordered.
That volatile operations in a thread are made visible to all other threads in the same order. However, there is no total order of volatile operations from all threads, i.e. if one threads performs V1 and then V2, and another thread performs V3 and then V4, then any order that has V1 before V2 and V3 before V4 can be observed by any thread. In this case, it can be either of the following:
That is, any possible order of observed side-effects are valid for any thread for a single execution. There is no requirement on total ordering, such that all threads observe only one of the possible orders for a single execution.
How are things synchronized?
Essentially, it boils down to this: a synchronization point is where you have a volatile read that happens after a volatile write.
In practice, you must detect if a volatile read in one thread happened after a volatile write in another thread3. Here's a basic example:
However generally inefficient, you can run two different threads, such that one calls
InefficientWait()and another one callsSignal(), and the side-effects of the latter when it returns fromSignal()become visible to the former when it returns fromInefficientWait().Volatile accesses are not as generally useful as interlocked accesses, which are not as generally useful as synchronization primitives. My advice is that you should develop code safely first, using synchronization primitives (locks, semaphores, mutexes, events, etc.) as needed, and if you find reasons to improve performance based on actual data (e.g. profiling), then and only then see if you can improve.
If you ever reach high contention for fast locks (used only for a few reads and writes without blocking), depending on the amount of contention, switching to interlocked operations may either improve or decrease performance. Especially so when you have to resort to compare-and-swap cycles, such as:
Meaning, you have to profile the solution as well and compare with the current state. And be aware of the A-B-A problem.
There's also
SpinLock, which you must really profile against monitor-based locks, because although they may make the current thread yield, they don't put the current thread to sleep, akin to the shown usage ofSpinWait.Switching to volatile operations is like playing with fire. You must make sure through analytical proof that your code is correct, otherwise you may get burned when you least expect.
Usually, the best approach for optimization in the case of high contention is to avoid contention. For instance, to perform a transformation on a big list in parallel, it's often better to divide and delegate the problem to multiple work items that generate results which are merged in a final step, rather than having multiple threads locking the list for updates. This has a memory cost, so it depends on the length of the data set.
What are the differences between the C# specification and the CLI specification regarding volatile operations?
C# specifies side-effects, not mentioning their inter-thread visibility, as being a read or write of a volatile field, a write to a non-volatile variable, a write to an external resource, and the throwing of an exception.
C# specifies critical execution points at which these side-effects are preserved between threads: references to volatile fields,
lockstatements, and thread creation and termination.If we take critical execution points as points where side-effects become visible, it adds to the CLI specification that thread creation and termination are visible side-effects, i.e.
new Thread(...).Start()has release semantics on the current thread and acquire semantics at the start of the new thread, and exiting a thread has release semantics on the current thread andthread.Join()has acquire semantics on the waiting thread.C# doesn't mention volatile operations in general, such as performed by classes in
System.Threadinginstead of only through using fields declared asvolatileand using thelockstatement. I believe this is not intentional.C# states that captured variables can be simultaneously exposed to multiple threads. The CIL doesn't mention it, because closures are a language construct.
1.
There are a few places where Microsoft (ex-)employees and MVPs state that writes have release semantics:
Memory Model, by Chris Brumme
Memory Models, Understand the Impact of Low-Lock Techniques in Multithreaded Apps, by Vance Morrison
CLR 2.0 memory model, by Joe Duffy
Which managed memory model?, by Eric Eilebrecht
C# - The C# Memory Model in Theory and Practice, Part 2, by Igor Ostrovsky
In my code, I ignore this implementation detail. I assume non-volatile writes are not guaranteed to become visible.
2.
There is a common misconception that you're allowed to introduce reads in C# and/or the CLI.
The problem with being second, by Grant Richins
Comments on The CLI memory model, and specific specifications, by Jon Skeet
C# - The C# Memory Model in Theory and Practice, Part 2, by Igor Ostrovsky
However, that is true only for local arguments and variables.
For static and instance fields, or arrays, or anything on the heap, you cannot sanely introduce reads, as such introduction may break the order of execution as seen from the current thread of execution, either from legitimate changes in other threads, or from changes through reflection.
That is, you can't turn this:
into this:
if you can ever tell the difference. Specifically, a
NullReferenceExceptionbeing thrown by accessinglocal's members.In the case of C#'s captured variables, they're equivalent to instance fields.
It's important to note that the CLI standard:
says that non-volatile accesses are not guaranteed to be visible
doesn't say that non-volatile accesses are guaranteed to not be visible
says that volatile accesses affect the visibility of non-volatile accesses
But you can turn this:
into this:
You can turn this:
into this:
or this:
because you can't ever tell the difference. But again, you cannot turn it into this:
I believe it was prudent in both specifications stating that an optimizing compiler may reorder reads and writes as long as a single thread of execution observes them as written, instead of generally introducing and eliminating them altogether.
Note that read elimination may be performed by either the C# compiler or the JIT compiler, i.e. multiple reads on the same non-volatile field, separated by instructions that don't write to that field and that don't perform volatile operations or equivalent, may be collapsed to a single read. It's as if a thread never synchronizes with other threads, so it keeps observing the same value:
There's no guarantee that
Stop()will stop the worker. Microsoft's .NET implementation guarantees thatstop = true;is a visible side-effect, but it doesn't guarantee that the read onstopinsideWork()is not elided to this:That comment says quite a lot. To perform this optimization, the compiler must prove that there are no volatile operations whatsoever, either directly in the block, or indirectly in the whole methods and properties call tree.
For this specific case, one correct implementation is to declare
stopasvolatile. But there are more options, such as using the equivalentVolatile.ReadandVolatile.Write, usingInterlocked.CompareExchange, using alockstatement around accesses tostop, using something equivalent to a lock, such as aMutex, orSemaphoreandSemaphoreSlimif you don't want the lock to have thread-affinity, i.e. you can release it on a different thread than the one that acquired it, or using aManualResetEventorManualResetEventSliminstead ofstopin which case you can makeWork()sleep with a timeout while waiting for a stop signal before the next iteration, etc.3.
One significant difference of .NET's volatile synchronization compared to Java's volatile synchronization is that Java requires you to use the same volatile location, whereas .NET only requires that an acquire (volatile read) happens after a release (volatile write). So, in principle you can synchronize in .NET with the following code, but you can't synchronize with the equivalent code in Java:
This surreal example expects threads and
Thread.Sleep(int)to take an exact amount of time. If this is so, it synchronizes correctly, becauseDoWork2performs a volatile read (acquire) afterDoWork1performs a volatile write (release).In Java, even with such surreal expectations fulfilled, this would not guarantee synchronization. In
DoWork2, you'd have to read from the same volatile field you wrote to inDoWork1.