We want to pull stack traces from a running process. Pulling stack traces directly with gstack is not an option and using a gdbserver works, but is quite slow due to the network. We were curious if we can pull a core dump of the process with
gdb --ex "attach $PID" --ex "gcore core_dump_file" --ex "q"
copy the core dump to a container with the compiled executable and then analyze the core dumps inside the container with
gdb $PATH_TO_EXECUTABLE core_dump_file --ex "thread apply all bt" --ex "q"
The core dump can reach sizes of hundreds of gigabytes in production, so we would like to filter the coredump as much as possible.
The memory sections written to the core dump can filtered by changing the coredump_filter file at /proc/$PID/coredump_filter (see the man page for core). As long as bit 0 (Dump anonymous private mappings) is set, everything works fine, we get the expected backtraces. It can be set with
echo 0x1 > "proc/$PID/coredump_filter
Unfortunately, the anonymous private mappings are very large and make up about 75% of the unfiltered coredump, e.g. in one of our test cases, the default coredump with 0x33 was 5 GB, and the filtered coredump with 0x1 was still 3.8 GB large.
If we filter as much as possible by setting all bits to zero with
echo 0x0 > "proc/$PID/coredump_filter
the resulting coredump becomes very small with a size of only one MB (down from several gigabytes of size), however using thread apply all bt fails with
Thread 1 (LWP 2653):
#0 0x00007fae7d4218fd in ?? ()
Backtrace stopped: Cannot access memory at address 0x7ffe21387490
With the fully filtered coredump, I can see that the gdb does not know which shared libraries to load:
(gdb) info shared
No shared libraries loaded at this time.
What does gdb actually need to get the backtraces? Is there a way to filter everything gdb does not need for the back traces? Is there a better way to get the backtraces (does not necessarily have to use gdb)?
_DYNAMIC[].r_debug->r_mapand the entire link chain ofstruct link_maps to know where all the DSOs are.Not with a
coredump_filter, there isn't.There are some ways:
SIGPWRor another rarely used signal) into each thread. In the signal handler you could walk the stack (using e.g. libunwind), andwritethat stack somewhere.It's then a small matter of arranging to send the signal to every thread.
core, and which are filtered out.It also allows you to compress the
coreUnfortunately, option (1) requires quite a bit of very fiddly async-signal-safe code, and mistakes may cause your process to crash or deadlock.
Option (2) doesn't have that problem, but the last public release was 15 years ago, and may no longer work.
Option (3) may or may not be faster than GDB, but from what I understand LLDB may be a lot less "chatty" than GDB/gdbserver, and that may help with the speed when the network latency is significant.