Initializing a large map in header cause g++ crashes

90 Views Asked by At

I am trying to make a simple tool that requires some lookup on a fixed key-value dataset, so I try to lazily throw all data to a hashmap in the header file:

/** main.h */
#include <unordered_map>
#include <cstdint>
using namespace std;

const unordered_map<uint64_t, const char * const> test = {
    {0xDEADC0DE, "Some short text less than 50 characters"},
    // 46K rows of data
};

I haven't implemented anything yet, but just including this header file is enough to crash the compiler.

main.cpp

/** main.cpp */
#include <iostream>
#include "main.h"

int main() {
    return 0;
}

After maxing out a CPU core for 5 minutes, the g++ (cc1plus) eats up all 32GB of RAM and crashes. I know a large header could impact compiling performance but I did not expect it to exhaust resources and fail. How does it use up 32GB RAM when the size of the header file is only 1.9 MB? Could someone please help explain the problem in my case?

The version I am using is g++ (GCC) 13.2.1 20230801, with the command /usr/bin/g++ -O3 -DNDEBUG -o CMakeFiles/main.cpp.o -c /home/foo/main.cpp

Update

I also did some experiments with different sizes of map:

Element Number Build Time
10 00:00:01.043
100 00:00:01.187
1000 00:00:05.225
2000 00:00:10.200
5000 00:00:25.604
10000 00:00:52.208
20000 00:01:48.090

Update

The problem is solved by disabling compiler optimization. I am using the VS Code CMake extension and the Release profile adds -O3 to the g++ argument. Removing this allows the project (46K rows) to be compiled in 6 seconds. The compiler must be trying hard to cast some optimization magic that unfortunately goes wrong.

1

There are 1 best solutions below

2
Chukwujiobi Canon On

The large header file is killing your performance. Don’t bully your compiler!

Imagine includeing <main.h> —which as you say has over forty-six thousand (46000 + 1) elements— in every source file that needs it. That would mean you duplicate the large object test everywhere you include it and your compiler is forced to preprocess the header and compile it everywhere it is included. This is bad! really bad!!

Like I mentioned in my comment, this object test should be in a translation unit, should have static storage duration and should have external linkage. This is so that it will be compiled once, will live till program termination and can be made available in other translation units by using the extern keyword to refer to it.

test.cpp

#include <unordered_map>
#include <cstdint>

const std::unordered_map<uint64_t, const char* const> test {
    {0xDEADC0DE, "Some short text less than 50 characters"},
    /* 46K rows of data */
};

main.cpp

#include <iostream>
#include <unordered_map>
#include <cstdint>

extern const std::unordered_map<uint64_t, const char* const> test;

int main() {
    std::cout << test[0xDEADC0DE] << '\n';
    return 0;
};

Anywhere you want to use test, simply declare it extern and you will be referring to the same object in static storage. This way, you will avoid copying test and also save some compilation costs.