Shouldn't this give me dangling reference errors?

102 Views Asked by At

I'm returning to C++ after about a decade of other programming languages so bear with me here. The following program compiles for me in a C++20 project in CLion:

#include <iostream>

using namespace std;

class MyClass {
private:
public:
    MyClass() {
        cout << "MyClass constructor" << endl;
    }
    MyClass(const MyClass& myClass) {
        cout << "MyClass copy constructor" << endl;
    }

    MyClass& operator=(const MyClass& myClass) {
        cout << "MyClass operator=" << endl;
        return *this;
    }
    friend std::ostream& operator<< (std::ostream& os, const MyClass &myClass){
        return os << "MyClass stringifier";
    }
    ~MyClass(){
        cout << "MyClass destructor" << endl;
    }
};

MyClass& f(){
    cout << "f() entered" << endl;
    MyClass x;
    return x;
}

MyClass* g(){
    cout << "g() entered" << endl;
    MyClass x;
    return &x;
}

int main() {
    cout << "main() entered" << endl;
    MyClass& a = f();
    cout << "f() exited" << endl;
    cout << a << endl;
    MyClass* b= g();
    cout << "g() exited" << endl;
    cout << *b << endl;
}

and the output is:

main() entered
f() entered
MyClass constructor
MyClass destructor
f() exited
MyClass stringifier
g() entered
MyClass constructor
MyClass destructor
g() exited
MyClass stringifier

Now, when I hover over the return statements, CLion does give me some warnings:

Address of stack memory warning

Reference to stack memory warning

It might be interesting to note that this now quite old article mentions that the scenario generated by function f() is "not valid" C++. Yet my compiler seems to disagree.

I am surprised that this program compiles and runs without issue. Should I not be getting a "dangling reference" error at runtime where I try to print a and *b? I don't think that this falls under Return Value Optimization / Copy Elision since I am not returning a new instance of MyClass anywhere: the copy constructor is clearly not called anywhere since we can't see its printing side effect.

// Edit: The accepted answer to this SO post, posted in 2011, also seems to suggest that what f does here should not work.

1

There are 1 best solutions below

0
selbie On

Here's a simple example using the same MyClass as you've provided:

int main() {
    MyClass* ptr = g();
    std::ostream << *ptr << std::endl;
}

Technically, it's undefined behavior to dereference a bad pointer like that. But as you've noted, it just happens to work. For that matter g() could even return NULL and this program still likely works. But here's why it just happens to work:

At the end of the day, those C++ methods you defined get logically compiled into functions just like C functions - except with "this" pointer accounted for and name mangling for overloads. Hence, the compiler (sans name mangling) generated a function like this:

std::ostream* MyClass_operator_ostream(std::ostream* os, MyClass* this) {
    os->write("MyClass stringifier", 19);
}

And so your main is basically doing this:

    MyClass* ptr = g();
    MyClass_operator_ostream(&cout, ptr);

However you stream operator implementation doesn't actually use the "this" pointer. So there's actually no code getting generated that would touch that bad pointer. Hence, there's no dangling references that actually get hit.

Now let's say you extended MyClass to include a member variable and then your stream operator overload references that variable.

class MyClass {

    int value;   // gets assigned by constructor

    ...

    friend std::ostream& operator<< (std::ostream& os, const MyClass &myClass){
        return os << "MyClass stringifier.  value = " << value;
    }

    ...

}

Hence, the compiler will generate code like this again:

std::ostream* MyClass_operator_ostream(std::ostream* os, MyClass* this) {
    os->write("MyClass stringifier.  value = ", 30);

    const char* tmp = some_internal_code_to_convert_int_to_string(this->x);
    
    os->write(tmp, strlen(tmp));

}

Oops, now this->x is getting used. If this is null, it will surely crash. If this is pointing to stack memory from a previously called function, it might print out the expected value. It more likely will print out whatever memory is on the stack at this moment. Especially true if another function was called in between g() and cout << *g() << endl.

Everything I've said above is in reference to your g() function that returns a pointer. But the f() function is the same. The compiler is treating references as pointers under the hood.

But technically, it's undefined behavior and you can't rely on anything I've said in this entire answer to hold. But it's the most likely explanation.