Is accessing a struct within a struct by pointers undefined behavior?

176 Views Asked by At

I have a struct A that contains a struct B and I access A via a pointer pa and b via pointer ba. Is this undefined behavior (Strict Aliasing Rule)? (The struct B part is not accessed by the pointer to struct A pa.) Note that there is no cast but still there are pointers of different types referring to the same memory.

struct B {int x;};
struct A { int foo; struct B b; int bar;};
struct A global_a = { 1, { 2 }, 3 };
struct A * get_ptr_to_struct_A()
{
        return &global_a;
}
void modify_struct_B(struct B * p)
{
        p->x++;
}
int main()
{
    struct A local_A;
    struct A * pa = get_ptr_to_struct_A();
    struct B * pb = &pa->b;
    local_A = *pa; // First assignment
    modify_struct_B(pb);
    local_A = *pa;  // 2nd assignment. Can the compiler chose to skip this?
}

According to the strict aliasing rule, the compiler can assume that pointers of different types do not point to the same memory. Hence, the compiler can draw the conclusion that the second assignment is meaningless and optimize it away. (I.e. it assumes that the pointer pb points somewhere where the pointer pa does not point.)
What prevents the compiler from ignoring the second assignment?

3

There are 3 best solutions below

3
Andrés Alcarraz On

Short answer is not undefined behavior.

Why?

  struct A * pa = get_ptr_to_struct_A();

pa points to global_a which contains foo=0, b.x=0, and bar=0.

  struct B * pb = &(pa->b);

Now pb points to (has the address of) global_a.b.

  modify_struct_B(pb); 

You are passing a valid pointer to an instance of struct B (pb->x=0).

void modify_struct_B(struct B * p)
{
        p->x++;
}

It just increments pb->x, so after this, pb->x=1.

Every step is deterministic and there is no room for the compiler to implement something different.

The strict aliasing rule is not at play here because you are not aliasing anything since you are not reinterpreting or casting any pointer.

In https://en.cppreference.com/w/c/language/object, it states that is about interpreting T2 reference as a T1 reference.

Strict aliasing

Given an object with effective type T1, using an lvalue expression (typically, dereferencing a pointer) of a different type T2 is undefined behavior

So you would have undefined behavior if:

  • b was the first element in struct A
  • and, you assigned:
struct B * pb = (struct A *)pa;
1
chqrlie On

There is no undefined behavior in the examples posted. There is no problem using pointers to point to structure members such as pb, you could also have a pointer to the x member of *pb:

    int *pi = &pb->x;

What you must be careful about is the life time of the objects pointed to by these pointers. If the object goes out of scope, the pointers become invalid, they are dangling pointers and dereferencing them for reading or writing has undefined behavior.

Here is an example:

struct B { int x; };
struct A { int foo; struct B b; int bar; };

struct A global_a = { 0 };

struct A *get_ptr_to_struct_A()
{
    return &global_a;
}

void modify_struct_B(struct B *p)
{
    p->x++;
}

int main()
{
    struct A *pa = get_ptr_to_struct_A();
    struct B *pb = &pa->b;

    modify_struct_B(pb); // no problem 

    {
        struct A local_a;
        pa = &A;
        pb = &pa->b;
        modify_struct_B(pb); // no problem
    }

    modify_struct_B(pb); // undefined behavior. local_a has gone out of scope
    return 0;
}
0
supercat On

Given the definitions:

struct s {int x[2]; };
struct t {struct s sv[2]; };
struct u {struct t tv[2]; };

the Standard specifies in N1570 6.5p7 the types of lvalue that may be used to access the stored value of an object of type struct t. In particular, the storage may be accessed using lvalues of type t, u, or other structures containing a member of type struct t, or qualified variations of those types. The list of allowable types would not struct s nor int.

Obviously arrays within structures would be rather useless if one couldn't use any lvalues of the element types to access their members. There are at least two ways of resolving this problem:

  1. Treat accesses of the form someStruct.member and someStruct.arrayTypeMember[index] as though the storage is being accessed "by" an lvalue of the structure type, even though that would contradict the part of the Standard that says the latter form is equivalent to *(someStruct.arrayTypeMember+index). Both clang and gcc seem to do this.

  2. Treat the Standard as allowing structures' stored values to be accessed by lvalues of member type that have no apparent relation to any containing structure type. Clang and gcc seem to do this as well.

  3. Treat any pointer which is freshly visibly derived from the address of an object of some other type as being capable, at least within the context of such derivation, of accessing storage of the type from which it was derived.

Under interpretation #3, a compiler given e.g.

struct writeThingie {int len,size; int *dat; } *it;
... and then within some loop
if (it->len < it->size)
  it->dat[it->len++] = something;

would only need to accommodate the possibility that the assignment might modify the value of it->size if something within the function took the address of member len of a structure writeThingie and could have caused it to be stored in it->dat.

I think it would have been considered obvious and non-controversial that code should be able to use an lvalue of the form structPtr->array[index] to access an element of an array-type member of the pointer's target type, but different compilers would have achieved that result via different means, and no single set of rules could accurately describe them all. Instead, the Standard waived jurisdiction on the assumption that compiler writers would try to satisfy programmers' needs.