Why does C++ allow making a reference to a variable that goes out of scope?

241 Views Asked by At

Consider the following code

struct Ri {
    int& i;
};

auto do_something() {
    int v{2};
    return Ri{v};
}

Why is this allowed to compile?

It should be possible to infer that the scope of the returned Ri object is that of the caller. Why is it allowed to be created with a reference to a variable with the local scope?

Is there a legitimate use case for this?

3

There are 3 best solutions below

3
Super-intelligent Shade On

Is there a reason why this is allowed to compile?

Because there is nothing in the standard that says it shouldn't.

Surely it is possible to infer that the scope of the returned Ri object is that of the caller. So why is it allowed to be created with a reference to a variable with the local scope?

In this particular case, yes. But in general things may not be so simple.

Here is one example:

int g = 42;

struct Ri { int& i; };

auto do_something()
{
    int& v = g;
    return Ri{v};
}

What should the compiler do in this case? v went out of scope, but g that it referred to is still in scope. So, now the compiler has to keep track of the scope of the assigned value? Multiply number of variables by number of assignments and things may start getting a bit hairy.

Here is another example:

extern int& get_v(); // external library with no source code

struct Ri { int& i; };

auto do_something()
{
    int& v = get_v();
    return Ri{v};
}

In this case the compiler has no way of knowing lifetime of the value returned by get_v(). So, what do we do now?

UPDATE:

You/compiler can't assume that the scope of the value returned by get_v() extends to the translation unit, eg:

extern void init();   // create value returned by get_v()
extern int& get_v();  // return value
extern void deinit(); // delete value returned by get_v()

struct Ri { int& i; };

auto do_something()
{
    init();
    int& v = get_v();
    deinit();
    return Ri{v};
}

Is there a legitimate use case for this?

I don't think so.

0
463035818_is_not_an_ai On

It is allowed because there is nothing wrong in your code. If you were to access the reference you would invoke undefined behavior, but you are not doing that. Compilers will try to warn about that.

For example this code:

#include <iostream>

struct Ri {
    int& i;
};

auto do_something() {
    int v{2};
    return Ri{v};
}

int main() {
    auto x = do_something();
    std::cout << x.i;
}

gcc produces following output (-Werror -Wall -O3):

<source>: In function 'int main()':
<source>:14:20: error: 'v' is used uninitialized [-Werror=uninitialized]
   14 |     std::cout << x.i;
      |                    ^
<source>:8:9: note: 'v' was declared here
    8 |     int v{2};
      |         ^
cc1plus: all warnings being treated as errors

However, it will not warn when you remove -O3. It will also not warn when you remove the line that attempts to print the value of x.i.

0
Artyer On

The reason this is allowed to compile is because the C++ standard says it compiles.

You are right that this always returns a dangling reference. So does the much simpler:

int& do_something() {
    int v = 2;
    return v;
}

There are some cases where dangling references are explicitly disallowed. One is in a mem-initializer:

struct Ri {
    Ri() : i(2) {}  // Error: Cannot have reference bind to a temporary
    const int& i = 2;
};

The other is with INVOKE<R>:

int f();
std::function<const int&()> g = f;  // Error: Returned reference would bind to a temporary and immediately dangle

These are both for references that become dangling because of temporaries that immediately die.

Noone has put in the effort to make your specific example ill-formed. Someone would have to write a paper and submit it to the C++ standards committee.

But it is likely to be rejected because it would be too much effort to verify and standardise and implementations should already give you warnings about the dangling reference without making it flat out ill-formed.


As for a "legitimate" reason: Comparing &do_something().i with a pointer to any previously created int object would have to compare unequal. E.g.:

extern Ri x;
extern int i;
assert(&do_something().i != &x.i);
assert(&do_something().i != &i);
assert(&do_something().i != nullptr);

If your type was something hard to access, this dangling reference can give you something of the correct type that can actually be used (as opposed to std::declval):

struct Rnp {
    std::nullptr_t& np;
};

struct Empty {};

struct Re {
    Empty& e;
};

constexpr Rnp get_np() { std::nullptr_t np; return Rnp{ np }; }
constexpr Re get_e() { Empty e; return Re{ e }; }

static_assert(get_np().np == 0);
static_assert([]{
    Re re = get_e();
    return typeid(re.e) == typeid(Empty);
}());

Or maybe it comes from a code path that isn't actually ran:

template<typename U>
U f(auto&& fn) {
    auto x = fn();
    return U{x};
}

[[noreturn]] int die();

int main() {
    f<Ri>(die);  // It might be much more subtle in an actual codebase
}

These are pretty contrived reasons, but remember the ultimate reason it compiles is because the standard doesn't say it isn't allowed.

In fact, these contrived reasons can also be applied to mem-initializers and INVOKE<R>, but the committee voted that disallowing these constructs is more useful than harmful.