AddressSanitizer randomly throws SIGSEGV with no explanation

856 Views Asked by At

Project

I have a game project in C++ that I'm currently developing. I compile every source file with -g3 -std=c++2a -Wall ... -fsanitize=address -fsanitize=leak to check for leaks and Segfaults

The main problem

The problem is, randomly (1 in 5 times), asan (address or leak), terminates the program before reaching main with a SIGSEGV without any diagnostics.

AddressSanitizer:DEADLYSIGNAL
=================================================================
==28573==ERROR: AddressSanitizer: SEGV on unknown address 0x625505a4ce68 (pc 0x7cc52585f38f bp 0x000000000000 sp 0x7fff63949020 T0)
==28573==The signal is caused by a READ memory access.
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer: nested bug in the same thread, aborting.

The address the SEGV happens on is always different, as is the pc (except for the last 3 digits, e68, 38f respectively)

The system it runs on

My machine is Arch Linux 6.7.0-arch3-1 and I'm using g++ (GCC) 13.2.1 20230801, GNU gdb (GDB) 13.2, that are the latest on the repositories at the moment of writing

What I've tried

I have no idea how to hunt down this bug, nor what might be causing it.

In code

I am sure the problems happens before main since printing something (with cout or printf) has no effect, same for using a signal handler, signal(SIGSEGV, &handle);

asan is part of it

Without asan the SEGV does not happen. (I have tried 50~ times and the program started correctly every time)

gdb

Using gdb with the program compiled with asan and ASLR turned off caused the SIGSEGV and the automatic catch

assembly instruction of the problem

Given the strange pattern of addresses that the problem happens on I tried using a watchpoint on any $pc ending with 38f (watch ((size_t)$pc & 0xfff) == 0x38f). The watchpoint works, the address in question is contained in a libc function (do_lookup_x or similar) that is seemingly called thousands of times, before the main begins, making debugging this way practically a nightmare.

The question

I would like to ask if anybody has any idea on how to get more information out of asan, gdb, or any other tool, because at this moment I do not have enough information to know where the problem happens or even if the problem is mine or not.


Updates

@marekR and @eljay suggested some kind of symbol collision with some glibc function / names. Most of my definitions are enclosed in a namespace (thus also name mangled) and the only functions generic enough to collide with some other name are init(), loop(), and terminate(). Changing their name did not solve the issue

Following @ÖöTiib suggestion i tested my git history with git bisect, this problem present itself since the first commit, back in 2019, this means that it might have gone unnoticed all of this time, (I'm the only working on this project but seems unlikely), this is a combination of factors local to my machine, or something else

4

There are 4 best solutions below

0
Leonard On BEST ANSWER

Thanks to @EmployedRussian I was capable of track down the bug origin. Since this was the point of this question I'd close this Post.

I will try to solve the bug myself and, in case, open another question / bug tracker on asan if I'm not capable.

I any case thank you for helping me.

For anyone interested, compiling the binary with -fsanitize=address and running it under gdb with set disable-radomization off can cause the SIGSEGV, gdb should catch it automatically.

I'd consider this question closed.

2
Sentakki On

I encountered the same issue; simply downgrade the kernel. I'm not sure why, but it seems that asan doesn't get along with the current version. After running the following command and a quick reboot, everything returned to normal: pacman -Udd file:///var/cache/pacman/pkg/linux-headers-6.6.10.arch1-1-x86_64.pkg.tar.zst file:///var/cache/pacman/pkg/linux-6.6.10.arch1-1-x86_64.pkg.tar.zst Replace "6.6.10.arch1-1" in both files with the version you wish to downgrade to and it should do the trick.

0
dk949 On

(I know this isn't an answer, exactly, but I couldn't post it as a comment)

If we are seeing the same issue, which appears to be the case judging by its spurious nature and the non-typical asan output, I'm pretty sure the bug is not in your code. This program exhibits the same behaviour (crash ~ every 5 times, all fine without asan and valgrind also says nothing):

#include <array>

int main() {
    std::array p {1, 2};
    for (auto i : p) { }
}

compiled with g++ test.cpp -std=c++20 -fsanitize=address running the executable a few times produces:

AddressSanitizer:DEADLYSIGNAL
=================================================================
==38449==ERROR: AddressSanitizer: SEGV on unknown address 0x6212e1c61e78 (pc 0x779949c4638f bp 0x000000000000 sp 0x7fff726381e0 T0)
==38449==The signal is caused by a READ memory access.
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer: nested bug in the same thread, aborting.

My system: Arch Linux, kernel 6.6.12-1-lts, g++ (GCC) 13.2.1 20230801

P.S.

Forgot to mention, this does appear to happen with clang (clang version 16.0.6, using libstdc++)

0
manolitomanolon On

C instead of c++, same kernel (6.7.0-arch3-1) and same behaviour with -fsanitize=address:

AddressSanitizer:DEADLYSIGNAL

==123687==ERROR: AddressSanitizer: SEGV on unknown address 0x62a50d774e78 (pc 0x7f62b3d5338f bp 0x000000000000 sp 0x7ffdf31e6480 T0) ==123687==The signal is caused by a READ memory access. AddressSanitizer:DEADLYSIGNAL AddressSanitizer: nested bug in the same thread, aborting.

No problems when compiled without fsanitize I guess it isn't our fault. It must be something related to glibc/fsanitize or similar.. p.s.:Thank god I found your post. I have spent the last two hours checking commits from the last month. After this I tried the same program with those cflags in macOS and it doesn't throw any SEGV errors. I haven't tried any other linux distros though.

This is the backtrace I obtain with gdb with the tips you shared:

(gdb) bt
#0  do_lookup_x (undef_name=undef_name@entry=0x7c00b973e6d8 "_thread_db_sizeof_pthread",
    new_hash=new_hash@entry=3872132951, old_hash=old_hash@entry=0x7ffc4c9abbe8, ref=0x0,
    result=result@entry=0x7ffc4c9abbf0, scope=<optimized out>, i=0, version=0x0, flags=3,
    skip=<optimized out>, type_class=0, undef_map=<optimized out>) at dl-lookup.c:405
#1  0x00007c00b9e240b8 in _dl_lookup_symbol_x (undef_name=0x7c00b973e6d8 "_thread_db_sizeof_pthread",
    undef_map=<optimized out>, ref=0x7ffc4c9abc78, symbol_scope=<optimized out>, version=0x0,
    type_class=0, flags=3, skip_map=0x0) at dl-lookup.c:793
#2  0x00007c00b957300e in do_sym (handle=<optimized out>,
    name=0x7c00b973e6d8 "_thread_db_sizeof_pthread",
    who=0x7c00b96fffb3 <__sanitizer::ThreadDescriptorSize()+35>, vers=vers@entry=0x0,
    flags=flags@entry=2) at dl-sym.c:146
#3  0x00007c00b9573331 in _dl_sym (handle=<optimized out>, name=<optimized out>, who=<optimized out>)
    at dl-sym.c:195
#4  0x00007c00b94a6ae8 in dlsym_doit (a=a@entry=0x7ffc4c9abee0) at dlsym.c:40
#5  0x00007c00b9e1b4e1 in __GI__dl_catch_exception (exception=exception@entry=0x7ffc4c9abe40,
    operate=0x7c00b94a6ad0 <dlsym_doit>, args=0x7ffc4c9abee0) at dl-catch.c:237
#6  0x00007c00b9e1b603 in _dl_catch_error (objname=0x7ffc4c9abe98, errstring=0x7ffc4c9abea0,
    mallocedp=0x7ffc4c9abe97, operate=<optimized out>, args=<optimized out>) at dl-catch.c:256
#7  0x00007c00b94a64f7 in _dlerror_run (operate=operate@entry=0x7c00b94a6ad0 <dlsym_doit>,
    args=args@entry=0x7ffc4c9abee0) at dlerror.c:138
#8  0x00007c00b94a6b75 in dlsym_implementation (dl_caller=<optimized out>, name=<optimized out>,
    handle=<optimized out>) at dlsym.c:54
#9  ___dlsym (handle=<optimized out>, name=<optimized out>) at dlsym.c:68
#10 0x00007c00b96fffb3 in __sanitizer::ThreadDescriptorSize ()
    at /usr/src/debug/gcc/gcc/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:298
#11 0x00007c00b97017ae in __sanitizer::ThreadDescriptorSize ()
    at /usr/src/debug/gcc/gcc/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:294
#12 __sanitizer::GetTls (size=0x7ffc4c9abfb8, addr=0x7c00b9dfa040)
    at /usr/src/debug/gcc/gcc/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:498
#13 __sanitizer::GetThreadStackAndTls (main=true, stk_addr=stk_addr@entry=0x7c00b9dfa020,
    stk_size=stk_size@entry=0x7ffc4c9abfc0, tls_addr=tls_addr@entry=0x7c00b9dfa040,
    tls_size=tls_size@entry=0x7ffc4c9abfb8)
    at /usr/src/debug/gcc/gcc/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:595
#14 0x00007c00b96f0ff4 in __asan::AsanThread::SetThreadStackAndTls (this=this@entry=0x7c00b9dfa000,
    options=<optimized out>) at /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_thread.h:77
#15 0x00007c00b96f14ee in __asan::AsanThread::Init (this=this@entry=0x7c00b9dfa000,
    options=options@entry=0x0) at /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_thread.cpp:234
#16 0x00007c00b96f19e5 in __asan::AsanThread::ThreadStart (this=this@entry=0x7c00b9dfa000,
    os_id=130231) at /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_thread.cpp:264
#17 0x00007c00b96f2604 in __asan::CreateMainThread ()
    at /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_thread.cpp:295
#18 0x00007c00b96ee9df in __asan::AsanInitInternal ()
    at /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_rtl.cpp:480
#19 0x00007c00b9e1f02a in _dl_init (main_map=0x7c00b9e4e2d0, argc=1, argv=0x7ffc4c9ac0c8,
    env=0x7ffc4c9ac0d8) at dl-init.c:122
init.c 
121   signal(SIGINT, ft_sig_handler);
122   signal(SIGQUIT, ft_sig_handler);

makes no sense