TL;DR: Why does this https://godbolt.org/z/ohK31hW34 multithreaded program segfault?
Explanation: I am encountering weird behavior of my multithreaded C++ application. The application has multiple threads that loop in a while loop guarded by an std::atomic<bool> variable. I use this construct in multiple places, so I extracted it to a simple ThreadLoop class with Start(function) & Stop() methods.
class ThreadLoop
{
public:
ThreadLoop(const std::string& name) : mName(name) {}
~ThreadLoop() { Stop(); }
template <typename F>
void Start(F&& function)
{
if (mRunning)
return;
std::scoped_lock lock(mMutex);
if (mThread.joinable())
mThread.join();
mRunning = true;
mThread = std::thread([&]() {
while (mRunning)
{
function();
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
});
}
void Stop()
{
if (not mRunning)
return;
mRunning = false;
std::this_thread::sleep_for(std::chrono::milliseconds(100));
}
private:
std::atomic<bool> mRunning = false;
std::mutex mMutex;
std::thread mThread;
std::string mName;
};
I then use an object of this custom class as a member in a different 'worker' class that assigns a particular function to be periodically executed, like so
class Worker1
{
public:
void StartWorking()
{
mThread.Start([this]() { Work(); });
}
void StopWorking() { mThread.Stop(); }
private:
ThreadLoop mThread{"worker1 loop"};
void Work()
{
fmt::print("Working...\n");
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
};
I have all of these 'workers' in another class and call StartWorking() / StopWorking() on them at random time points (also in a ThreadLoop)
class Main
{
public:
void Start()
{
mThread.Start([this]() { MainLoop(); });
}
void Stop() { mThread.Stop(); }
private:
ThreadLoop mThread{"main loop"};
Worker1 mWorker1;
void MainLoop()
{
if (/*something*/)
mWorker1.StartWorking();
else
mWorker1.StopWorking();
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
};
The first ThreadLoop ("main loop") in the Main class starts fine and starts calling StartWorking() / StopWorking(), as expected. The StartWorking() then triggers the worker to start its own ThreadLoop ("worker1 loop"), which fails undeterministically inside the ThreadLoop::Start() function, for example via
`../nptl/pthread_mutex_lock.c:81: __pthread_mutex_lock: Assertion mutex->__data.__owner == 0 failed`
Also, according to the debugger, the entire ThreadLoop object seems to be uninitialized / destroyed (e.g. the std::string mName variable is empty, although I always provide a non-empty string) - this probably causes the std::scoped_lock to fail - locking an uninitialized/destroyed mutex. My question is, how/why is the ThreadLoop object uninitialized? I think I clearly construct it as a member inside each Worker1 object?
The issue was that I was capturing the
T&& functionby reference in the lambda insideThreadLoop::Start(), and the thread then invoked the lambda, which was a dangling reference. The fix was to captureT&& functionby value.Original version:
Fixed version:
Even better version: