Best way to implement a periodic linux task in c++20

501 Views Asked by At

I have a periodic task in c++, running on an embedded linux platform, and have to run at 5 ms intervals. It seems to be working as expected, but is my current solution good enough?

I have implemented the scheduler using sleep_until(), but some comments I have received is that setitimer() is better. As I would like the application to be at least some what portable, I would prefer c++ standard... of course unless there are other problems.

I have found plenty of sites that show implementation with each, but I have not found any arguments for why one solution is better than the other. As I see it, sleep_until() will implement an "optimal" on any (supported) platform, and I'm getting a feeling the comments I have received are focused more on usleep() (which I do not use).

My implementation looks a little like this:

bool is_submilli_capable() {
    return std::ratio_greater<std::milli,
            std::chrono::system_clock::period>::value;
}

int main() {
    if (not is_submilli_capable())
        exit(1);

    while (true) {
        auto next_time = next_period_start();
        do_the_magic();
        std::this_thread::sleep_until(next_time);
    }
}

A short summoning of the issue.

  • I have an embedded linux platform, build with yocto and with RT capabilities
  • The application need to read and process incoming data every 5 ms
  • Building with gcc 11.2.0
  • Using c++20
  • All the "hard work" is done in separate threads, so this question is only regards triggering the task periodically and with minimal jitter
2

There are 2 best solutions below

2
Wolfie On

Since the application is supposed to read and process the data every 5 ms, it is possible that a few times, it does not perform the required operations. What I mean to say is that in a time interval of 20 ms, do_the_magic() is supposed to be invoked 4 times... But if the time taken to execute do_the_magic() is 10 ms, it will get invoked only 2 times. If that is an acceptable outcome, the current implementation is good enough.

Since the application is reading data, it probably receives it from the network or disk. And adding the overhead of processing it, it likely takes more than 5 ms to do so (depending on the size of the data). If it is not acceptable to miss out on any invocation of do_the_magic, the current implementation is not good enough.

What you could probably do is create a few threads. Each thread executes the do_the_magic function and then goes to sleep. Every 5 ms, you wake a sleeping thread which will most likely take less than 5 ms to happen. This way no invocation of do_the_magic is missed. Also, the number of threads depends on how long will do_the_magic take to execute.

bool is_submilli_capable() {
    return std::ratio_greater<std::milli,
        std::chrono::system_clock::period>::value;
}

void wake_some_thread () {

    static int i = 0;
  
    release_semaphore (i); // Release semaphore associated with thread i

    i++;
    
    i = i % NUM_THREADS;

}

void * thread_func (void * args) {
    
    while (true) {
       
       // Wait for a semaphore

       do_the_magic();
       
}

int main() {
    if (not is_submilli_capable())
        exit(1);

    while (true) {
        auto next_time = next_period_start();
        wake_some_thread (); // Releases a semaphore to wake a thread
        std::this_thread::sleep_until(next_time);
}

Create as many semaphores as the number of threads where thread i is waiting for semaphore i. wake_some_thread can then release a semaphore starting from index 0 till NUM_THREADS and start again.

0
SK-logic On

5ms is a pretty tight timing.

You can get a jitter-free 5ms tick only if you do the following:

  • Isolate a CPU for this thread. Configure it with nohz_full and rcu_nocbs

  • Pin your thread to this CPU, assign it a real-time scheduling policy (e.g., SCHED_FIFO)

  • Do not let any other threads run on this CPU core.

  • Do not allow any context switches in this thread. This includes avoiding system calls altogether. I.e., you cannot use std::this_thread::sleep_until(...) or anything else.

  • Do a busy wait in between processing (ensure 100% CPU utilisation)

  • Use lock-free communication to transfer data from this thread to other, non-real-time threads, e.g., for storing the data to files, accessing network, logging to console, etc.

Now, the question is how you're going to "read and process data" without system calls. It depends on your system. If you can do any user-space I/O (map the physical register addresses to your process address space, use DMA without interrupts, etc.) - you'll have a perfectly real-time processing. Otherwise, any system call will trigger a context switch, and latency of this context switch will be unpredictable.

For example, you can do this with certain Ethernet devices (SolarFlare, etc.), with 100% user-space drivers. For anything else you're likely to have to write your own user-space driver, or even implement your own interrupt-free device (e.g., if you're running on an FPGA SoC).