Why does capturing stateless lambdas sometimes results in increased size?

Question

Why does capturing stateless lambdas sometimes results in increased size?

194 Views Asked by Fureeish At 22 April 2022 at 07:40

Given a chain of lambdas where each one captures the previous one by value:

auto l1 = [](int a, int b) { std::cout << a << ' ' << b << '\n'; };
auto l2 = [=](int a, int b) { std::cout << a << '-' << b << '\n'; l1(a, b); };
auto l3 = [=](int a, int b) { std::cout << a << '#' << b << '\n'; l2(a, b); };
auto l4 = [=](int a, int b) { std::cout << a << '%' << b << '\n'; l3(a, b); };

std::cout << sizeof(l4);

We can observe, that the resulting sizeof of l4 is equal to 1.

That makes sense to me. We are capturing lambdas by value and each of those objects has to have sizeof equal to 1, but since they are stateless, an optimization similar to [[no_unique_address]] one applies (especially since they all have unique types).

However, when I try to create a generic builder for chaining comparators, this optimization no longer takes place:

template <typename Comparator>
auto comparing_by(Comparator&& comparator) {
    return comparator;
}

template <typename Comparator, typename... Comparators>
auto comparing_by(Comparator&& comparator, Comparators&&... remaining_comparators) {
    return [=](auto left, auto right) {
        auto const less = comparator(left, right);
        auto const greater = comparator(right, left);
        if (!less && !greater) {
            return comparing_by(remaining_comparators...)(left, right);
        }
        return less;
    };
}

struct triple {
    int x, y, z;
};

auto main() -> int {
    auto by_x = [](triple left, triple right) { return left.x < right.x; };
    auto by_y = [](triple left, triple right) { return left.y < right.y; };
    auto by_z = [](triple left, triple right) { return left.z < right.z; };

    auto comparator = comparing_by(by_x, by_z, by_y);

    std::cout << sizeof(comparator);
}

Note 1: I am aware of the fact that comparing_by is inefficient and sometimes calls the comparator in a redundant fashion.

Why in the above case the resulting sizeof of comparator is equal to 3 and not to 1? It is still stateless, after all. Where am I wrong? Or is it just a missed optimization in all of the big three compilers?

Note 2: This is purely an academic question. I am not trying to solve any particular problem.

Original Q&A

There are 1 best solutions below

**Artyer** · Accepted Answer · 2022-04-22T09:07:21.947000

What's happening in the first example is not what you think it is. Let's say l1 has type L1, l2 L2 , etc. These are the members of those types:

struct L1 {
   // empty;
};

sizeof(L1) == 1

struct L2 {
    L1 l1;
};

sizeof(L2) == sizeof(L1)  // 1

struct L3 {
    L2 l2;
};

sizeof(L3) == sizeof(L2)  // 1

struct L4 {
    L3 l3;
};

sizeof(L4) == sizeof(L3)  // 1

And in your next example, you capture all the lambdas by value, so the closure type has three non-overlapping members, so the size will be at least 3.

[[no_unique_address]] can't be generically applied to the data members of a closure type (consider a empty class that puts its address in a global map).

The compiler could use empty base optimisation for a "well behaved type" (a trivilly-copyable empty type maybe?), so this might be a missed optimisation. The standard says this about what can be done ([expr.prim.lambda.closure]p2):

The closure type is not an aggregate type. An implementation may define the closure type differently from what is described below provided this does not alter the observable behavior of the program other than by changing:

the size and/or alignment of the closure type,

whether the closure type is trivially copyable ([class.prop]), or

whether the closure type is a standard-layout class ([class.prop]).

So the change in size is OK, but it would have to be done so that is_empty_v<lambda_that_captures_stateless_lambda> is not true (since that's an observable behaviour)

To "manually" apply this optimisation, you can, instead of calling the lambda comparator(left, right), default construct something of the type of the closure type and call that (decltype(comparator){}(left, right)). I've implemented that here: https://godbolt.org/z/73M1Gd3o5

Why does capturing stateless lambdas sometimes results in increased size?

There are 1 best solutions below

Related Questions in C++

Related Questions in LAMBDA

Related Questions in LANGUAGE-LAWYER

Related Questions in C++20

Related Questions in MEMORY-OPTIMIZATION

Trending Questions

Popular # Hahtags

Popular Questions