Preamble
This is a description of what I am trying to do with the code, skip to the next section to see the actual issue.
I want to use coroutines in an embedded system, where I can't afford too many dynamic allocations. Therefore, I am trying the following: I have non-copyable, non-movable awaitable types for the various queries to peripherals. When querying a peripheral, I use something like auto result = co_await Awaitable{params}. The constructor of the awaitable prepares the request to the peripheral, registers its internal buffer to receive the reply, and registers its ready flag in the promise. The coroutine is then suspended.
Later, the buffer will be filled, and the ready flag will be set to true. After this, the coroutine knows that it can be resumed, which the causes the awaitable to copy out the result from the buffer before being destroyed.
The awaitable is non-copyable and non-movable to force guaranteed copy elision everywhere, so that I can be sure that the pointers to buffer and ready remain valid until the awaitable has been awaited (at least that was the plan...)
The issue
I am encountering an issue with ARM GCC 11.3 in the following code:
#include <cstring>
#include <coroutine>
struct AwaitableBase {
AwaitableBase() = default;
AwaitableBase(const AwaitableBase&) = delete;
AwaitableBase(AwaitableBase&&) = delete;
AwaitableBase& operator=(const AwaitableBase&) = delete;
AwaitableBase& operator=(AwaitableBase&&) = delete;
char buffer[65];
};
struct task {
struct promise_type
{
bool* ready_ptr;
task get_return_object() { return {}; }
std::suspend_never initial_suspend() noexcept { return {}; }
std::suspend_always final_suspend() noexcept { return {}; }
void return_void() {}
void unhandled_exception() {}
};
};
struct Awaitable{
AwaitableBase base;
bool ready{false};
bool await_ready() {return false;}
void await_suspend(std::coroutine_handle<task::promise_type> handle)
{
handle.promise().ready_ptr = &ready;
}
int await_resume() { return 2; }
};
AwaitableBase make_awaitable_base()
{
return AwaitableBase{};
}
task example()
{
co_await Awaitable{make_awaitable_base()};
}
When compiling this with ARM GCC 11.3 without any optimizations, the code contains a memcpy call that moves around the AwaitableBase object (excerpt from Godbolt):
ldr r3, [r7, #4]
adds r3, r3, #87
mov r0, r3
bl make_awaitable_base()
ldr r2, [r7, #4]
ldr r3, [r7, #4]
add r0, r2, #21
adds r3, r3, #87
movs r2, #65
mov r1, r3
bl memcpy
ldr r3, [r7, #4]
movs r2, #0
strb r2, [r3, #86]
ldr r3, [r7, #4]
adds r3, r3, #21
mov r0, r3
bl Awaitable::await_ready()
This breaks my code, as I am relying the fact that the object cannot be moved/copied. It was my understanding that making an object non-copyable & non-movable should prevent it from being memcopied.
Observations/Comments
- The
memcpyis no longer present in 13.1 - unfortunately, I am stuck with 11.3 - The
memcpyis not present if I remove the aggreate initialization ofAwaitablewrapped aroundAwaitableBase(and instead makeAwaitableBaseitself the awaitable) - this doesn't work for me because I'd like to wrap other awaitables withAwaitableto modify their behavior - The
memcpyis not present without theco_await - As noted previously, I need the awaitable to have a stable address, as I rely on the fact that I can look at the
ready_ptrstored in the promise to check if the awaitable is done.
Question(s)
How can I work around this?
Is it a bug with the compiler, or am I misunderstanding something about guaranteed copy elision? Is it undefined behavior to rely on the fact that the address of the temporary should not change during the duration of the co_await call?
As pointed out in the comments, this is a GCC bug, where prvalues created by constructing objects in
co_awaitexpressions are erroneously treated as trivially copyable aggregates, creating a temporary that ismemcpy'd from.The fix is to never construct a non-trivial object directly in a
co_awaitexpression. E.g.,co_await Class{ ... },co_await function_call(Class{ ... })andco_await Class{ ... }.member_function()are all prone to this bug.You can replace these with
co_await [&]{ return ...; }();(which isco_await lambda_type(captured_references...)(), where that lambda type can be memcpy copied)You might want to macro-ify this to
#define CO_AWAIT(...) co_await [&]() -> decltype(auto) { return __VA_ARGS__ ; }()so you can just search for lowercaseco_awaitin your code base to completely eliminate this bug.