C11: Use volatile for simple communication between threads

50 Views Asked by At

I'm using C (more exactly: C11 with gcc) for developing some low-latency software for x64 (more exactly: Intel CPUs only). I don't care about portability or any other architecture in general.

I know that volatile is in general not the first choice for data synchronization. However, those three facts seem to be true:

  • volatile enforces writing data to memory and as well reading from memory (=so it's not allowed to "cache" the value in a register and it also implies that some optimizations cannot be done by the compiler)
  • volatile accesses must not be reordered by the compiler
  • 4 byte (or even 8 byte) values are always atomically written on x64 (same is true for reading)

Now I have this code:

typedef struct {
    double some_data;
    double more_data;
    char even_more_data[123];
} Data;

static volatile Data data;
static volatile int data_ready = 0;

void thread1()
{
    while (true) {
        while (data_ready) ;

        const Data x = f(...); // prepare some data
        data         = x;      // write it
        data_ready   = 1;      // signal that the data is ready  
    }
}

void thread2()
{
    while (true) {
        while (!data_ready) ;

        const Data x = data; // copy data
        data_ready   = 0;    // signal that data is copied
        g(x);                // process data
    }
}

thread1 is a producer of Data and thread2 is a consumer of Data. Note that is used those facts:

  • data is written before data_ready. So when thread2 reads data_ready and it's 1, then we know that data is also available (guarantee for the ordering of volatile)
  • thread2 first reads and stores data and then sets data_ready to 0, so thread1 can again produce some data and store it.
  • data_ready cannot have a weird state, because reading and writing an int (with 4 bytes) is automatically atomic on x64

This way was the fastest option I've finally had. Note that both threads are pinned to cores (which are isolated). They are busy polling on data_ready, because it's important for me to process the data as fast as possible.

Atomics and mutexes were slower, so I used this implementation.

My question is finally if it's possible that this does not behave as I expect it? I cannot find anything wrong in the shown logic, but I know that volatile is a tricky beast.

Thanks a lot

0

There are 0 best solutions below