Given a chain of lambdas where each one captures the previous one by value:
auto l1 = [](int a, int b) { std::cout << a << ' ' << b << '\n'; };
auto l2 = [=](int a, int b) { std::cout << a << '-' << b << '\n'; l1(a, b); };
auto l3 = [=](int a, int b) { std::cout << a << '#' << b << '\n'; l2(a, b); };
auto l4 = [=](int a, int b) { std::cout << a << '%' << b << '\n'; l3(a, b); };
std::cout << sizeof(l4);
We can observe, that the resulting sizeof of l4 is equal to 1.
That makes sense to me. We are capturing lambdas by value and each of those objects has to have sizeof equal to 1, but since they are stateless, an optimization similar to [[no_unique_address]] one applies (especially since they all have unique types).
However, when I try to create a generic builder for chaining comparators, this optimization no longer takes place:
template <typename Comparator>
auto comparing_by(Comparator&& comparator) {
return comparator;
}
template <typename Comparator, typename... Comparators>
auto comparing_by(Comparator&& comparator, Comparators&&... remaining_comparators) {
return [=](auto left, auto right) {
auto const less = comparator(left, right);
auto const greater = comparator(right, left);
if (!less && !greater) {
return comparing_by(remaining_comparators...)(left, right);
}
return less;
};
}
struct triple {
int x, y, z;
};
auto main() -> int {
auto by_x = [](triple left, triple right) { return left.x < right.x; };
auto by_y = [](triple left, triple right) { return left.y < right.y; };
auto by_z = [](triple left, triple right) { return left.z < right.z; };
auto comparator = comparing_by(by_x, by_z, by_y);
std::cout << sizeof(comparator);
}
Note 1: I am aware of the fact that comparing_by is inefficient and sometimes calls the comparator in a redundant fashion.
Why in the above case the resulting sizeof of comparator is equal to 3 and not to 1? It is still stateless, after all. Where am I wrong? Or is it just a missed optimization in all of the big three compilers?
Note 2: This is purely an academic question. I am not trying to solve any particular problem.
What's happening in the first example is not what you think it is. Let's say
l1has typeL1,l2L2, etc. These are the members of those types:And in your next example, you capture all the lambdas by value, so the closure type has three non-overlapping members, so the size will be at least 3.
[[no_unique_address]]can't be generically applied to the data members of a closure type (consider a empty class that puts its address in a global map).The compiler could use empty base optimisation for a "well behaved type" (a trivilly-copyable empty type maybe?), so this might be a missed optimisation. The standard says this about what can be done ([expr.prim.lambda.closure]p2):
So the change in size is OK, but it would have to be done so that
is_empty_v<lambda_that_captures_stateless_lambda>is nottrue(since that's an observable behaviour)To "manually" apply this optimisation, you can, instead of calling the lambda
comparator(left, right), default construct something of the type of the closure type and call that (decltype(comparator){}(left, right)). I've implemented that here: https://godbolt.org/z/73M1Gd3o5