Why would adding the "noexcept" keyword hurt a functions performance?

178 Views Asked by At

Edit: I was not so much concerned with my own example. I can see why you would want to have a reproducable example, because ofcource you might not trust that it is even possible that adding noexcept hurted the performance. I decided to rewrite the post and add a full example.

Problem & Question:

I was expecting that putting the keyword noexcept on a functions that can never throw an exception, would never hurt that functions performance. Just like you wouldn't expect adding the keyword const, to have any negative impact.

When adding noexcept to the function isVowel in the below example, it seemed to have hurt its performance.

Is there a reason why noexcept could slow down a function that can not not throw exceptions?

Example:

Below I generate a std::array<bool,256> with only the vowel bools set. Then the function bool isVowel(char c) looks up if a character is a vowel or not using the generated array. Here adding/removing noexcept to the function isVowel seems to have changed its performance.

#include <array>
#include <limits>

static constexpr unsigned char vowels[]{ 'a', 'e', 'i', 'o', 'u', 'A', 'E', 'I', 'O', 'U' };

auto constexpr genVowelTestMap() {
  std::array<bool, std::numeric_limits<256> m{};
  for (const unsigned char& c : vowels) { 
    m[c] = true;
  }
  return m;
}
static constexpr auto vowelTestMap = genVowelTestMap();

bool isVowel(const char c)noexcept/*The relevant noexcept*/{
  return vowelTestMap[c];
}

In this reproduction I test the functions by generating a random string of letters, and then running the functions a number of times using each character in the string as input. The full code can be found at the bottum.

The result is the following time measurments:(In my original test the difference was roughly 20%)

Time taken : noexcept version   : normal version.
Time taken : 814786 microseconds:645568 microseconds. //The first is always slowest. Should prob be ignored. 
Time taken : 675711 microseconds:612367 microseconds. //Run order was noexcept, normal and then repeat.
Time taken : 685613 microseconds:605072 microseconds.
Time taken : 655509 microseconds:607108 microseconds.
Time taken : 756300 microseconds:623599 microseconds.
Time taken : 718311 microseconds:605397 microseconds.
Time taken : 672052 microseconds:615306 microseconds.
Time taken : 703469 microseconds:608384 microseconds.
Time taken : 668540 microseconds:604204 microseconds.
Time taken : 667859 microseconds:605363 microseconds.

I compiled with c++20 for x64 using the MSVC compiler with /O2, /Ob2, /Oi and /GL. (For this example I created a new console project using visual studio 2022, the only setting that I changed was changing the version to c++20). When switching to /O3 the two functions performed the same.

FULL CODE:

#include <array>
#include <limits>
#include <string>
#include <cassert>
#include <chrono>
#include <iostream>

static constexpr unsigned char vowels[]{ 'a', 'e', 'i', 'o', 'u', 'A', 'E', 'I', 'O', 'U' };

auto constexpr genVowelTestMap() {
  std::array<bool, 256> m{}; static_assert(sizeof(char) == 1);
  for (const unsigned char& c : vowels) { 
    m[c] = true;
  }
  return m;
}
static constexpr auto vowelTestMap = genVowelTestMap();

inline bool isVowel(const char c){
  return vowelTestMap[c];
}
inline bool isVowelNoExcept(const char c)noexcept{
  return vowelTestMap[c];
}

std::string genRandomLetterString(size_t size) {
  std::string s(size, '\0');
  //24 + 24 options = 48 options.
  for (size_t i = 0; i < size; i++) {
    char& c = s[i];
    c = std::rand() % 48;
    if (c < 24) c += 'a';
    else c += 'a' - 24;
    assert(isalpha(c));
  }
  return s;
}

template<bool USE_NO_EXCEPT>
void runTest() {
  const int runs = 1000;
  auto start = std::chrono::high_resolution_clock::now();
  //As people have pointed out this should have been done before the start. I don't agree that it was a bad idea to randomly generate them differently for both calls, because of caching and the likes.
//Though I could have reseeded the random number generator to still get the same strings.
  const std::string s = genRandomLetterString(1u << 20);
  //I measured the generator function, and it took 17'978ms, so not really significant. With it excluded the difference remains. 

  size_t a = 0;
  for (int i = 0; i < runs; i++) {
    for (const char& c : s) {
      if constexpr (USE_NO_EXCEPT) {
        if (isVowelNoExcept(c))a++;
      }
      else {
        if (isVowel(c))a++;
      }
    }
  }

  auto end = std::chrono::high_resolution_clock::now();
  auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end - start).count();
  std::cout << (USE_NO_EXCEPT?"NOEXCEPT":"\t") << "\tr(" << (a/runs) << ")Time taken : " << duration << " microseconds." << std::endl;
}

int main() {
  for (int j = 0; j < 10; j++) {
    runTest<true>();
    runTest<false>();
  }
}

I have looked at a slight variant of this code using godbolt, but saw no difference between the two.

1

There are 1 best solutions below

5
Howard Hinnant On

The best way to performance test a function like this is to look at its optimized, un-inlined assembly, both with and without the noexcept.

If the assembly is identical, then so is the performance.

If you see exception tables and mentions of terminate with the noexcept version then the compiler has been forced to add calls to noexcept in case the function tries to throw an exception. This will at the very least cause code bloat which can also negatively impact performance.

My best guess is that vowelTestMap is a simple 256 byte array, and that the addition of noexcept does not change the generated assembly (at least under optimization).

Update using Full Code added:

I put this much into godbolt:

#include <array>

static constexpr unsigned char vowels[]{ 'a', 'e', 'i', 'o', 'u', 'A', 'E', 'I', 'O', 'U' };

auto constexpr genVowelTestMap() {
  std::array<bool, 256> m{}; static_assert(sizeof(char) == 1);
  for (const unsigned char& c : vowels) { 
    m[c] = true;
  }
  return m;
}
static constexpr auto vowelTestMap = genVowelTestMap();

bool isVowel(const char c){
  return vowelTestMap[c];
}
bool isVowelNoExcept(const char c)noexcept{
  return vowelTestMap[c];
}

Note that I un-inlined the two functions of interest, otherwise no object code would be generated (try that out on the linked demo). I set the compiler to x86 msvc v19.latest and the options to /std:c++20 /O2.

The generated assembly is identical for the two functions:

bool isVowel(char) PROC                              ; isVowel, COMDAT
        movsx   eax, BYTE PTR _c$[esp-4]
        mov     al, BYTE PTR std::array<bool,256> const vowelTestMap[eax]
        ret     0
bool isVowel(char) ENDP                              ; isVowel

_c$ = 8                                       ; size = 1
bool isVowelNoExcept(char) PROC                      ; isVowelNoExcept, COMDAT
        movsx   eax, BYTE PTR _c$[esp-4]
        mov     al, BYTE PTR std::array<bool,256> const vowelTestMap[eax]
        ret     0
bool isVowelNoExcept(char) ENDP 

Demo.

Conclusion: Adding noexcept has no impact on the performance of this function. It might have an impact on code which calls isVowel, but that is beyond the scope of this question.

But what if...

If isVowel tried to do something that might throw an exception, such as call this function:

void f();

then one gets very different results:

_c$ = 8                                       ; size = 1
bool isVowel(char) PROC                              ; isVowel, COMDAT
        call    void f(void)                         ; f
        movsx   eax, BYTE PTR _c$[esp-4]
        mov     al, BYTE PTR std::array<bool,256> const vowelTestMap[eax]
        ret     0
bool isVowel(char) ENDP                              ; isVowel

__$EHRec$ = -12                               ; size = 12
_c$ = 8                                       ; size = 1
bool isVowelNoExcept(char) PROC                      ; isVowelNoExcept, COMDAT
        push    ebp
        mov     ebp, esp
        push    -1
        push    __ehhandler$bool isVowelNoExcept(char)
        mov     eax, DWORD PTR fs:0
        push    eax
        mov     eax, DWORD PTR ___security_cookie
        xor     eax, ebp
        push    eax
        lea     eax, DWORD PTR __$EHRec$[ebp]
        mov     DWORD PTR fs:0, eax
        call    void f(void)                         ; f
        movsx   eax, BYTE PTR _c$[ebp]
        mov     al, BYTE PTR std::array<bool,256> const vowelTestMap[eax]
        mov     ecx, DWORD PTR __$EHRec$[ebp]
        mov     DWORD PTR fs:0, ecx
        pop     ecx
        mov     esp, ebp
        pop     ebp
        ret     0
        int     3
        int     3
        int     3
        int     3
        int     3
__ehhandler$bool isVowelNoExcept(char):
        npad    1
        npad    1
        mov     edx, DWORD PTR [esp+8]
        lea     eax, DWORD PTR [edx+12]
        mov     ecx, DWORD PTR [edx-4]
        xor     ecx, eax
        call    @__security_check_cookie@4
        mov     eax, OFFSET __ehfuncinfo$bool isVowelNoExcept(char)
        jmp     ___CxxFrameHandler3
bool isVowelNoExcept(char) ENDP                      ; isVowelNoExcept

isVowelNoExcept now has to set up code to catch any exception f() might throw, and then call std::terminate. This would obviously negatively impact performance and code size. And then you would need to employ testing benchmarks to find out by how much.

My personal guideline: Don't apply noexcept if the compiler is forced to generate code to call std::terminate().

Corollary: Lowest-level functions that can be marked noexcept must be made so before higher-level functions that call the lower-level functions can be marked noexcept.

Demo.

Try decorating void f() with noexcept in this demo.