Minimizing Meyers' singleton overhead

146 Views Asked by At

Meyers singleton, s. t.:

Foo& getSingleton() {
    static Foo singleton;
    return singleton;
}

Is known to be:

  • Thread-safe.
  • Have a branch(es?) inside.

And it also happens to be the easiest way to avoid static initialization order fiasco.

But, if I'm executing in single-threaded environment, its thread-safety (which can't be disabled via default language means) might be critical to performance.

What are the ways industry uses to get rid of extra atomic flag for Meyers singleton and minimize branches.

3

There are 3 best solutions below

1
Jakob Stark On BEST ANSWER

For gcc, clang, and msvc there are specific commandline options to turn off thread safety for static variables [1][2][3].

That beeing said, before using these flags (basically switching to non-C++-standard-mode) one should have carefully measured and identified the synchronization on the singleton as a hotspot optimization opportunity.

[1] https://gcc.gnu.org/onlinedocs/gcc/C_002b_002b-Dialect-Options.html#index-fno-threadsafe-statics
[2] https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-fthreadsafe-statics
[3] https://learn.microsoft.com/en-us/cpp/build/reference/zc-threadsafeinit-thread-safe-local-static-initialization

0
James Kanze On

Most compilers have options to compile with or without threads -- compiling without threads should generate a version which doesn't add any thread dependent overhead. More generally, however, I would expect the thread overhead to be negligible in a single threaded program -- acquiring an uncontested mutex normally only costs a single access to an atomic bool -- not the end of the world. I wouldn't worry about it until the profiler shows it to be a real problem.

0
Mike Vine On

A modern Meyers singleton does not need to require any atomic, fence or locking operations and can still be thread safe once the singleton had been created. This means the normal path (e.g. not the first time the thread encounters the static) is very very fast.

On MSVC x64 for example, a Meyers singleton for initialisation check requires 5 normal loads and a compare (+normal function overhead).

It does this by using TLS instead of atomics -> basically it has a table of epochs per thread and knows the "high watermark" of each thread, and then it can do a normal compare of that watermark with a value which represents when the static was created, in order to know whether it needs to do more complex multi-threaded or atomic work (which it only needs to do at most once per thread).

See lines 510 -> 516 https://godbolt.org/z/caGr6oqxa

The compare also will be well predicted as it's always true after the first call for each thread. So in a way there's no atomic operations and no branches with any real cost. Just a handful of loads.

You really do not have to worry about the cost of singletons.

You may want to check out whether your compiler supports this feature.

But obviously, if you are worried then run some performance tests.