aliasing issue with OO inheritance in C

108 Views Asked by At

I'm doing inheritance ( i.e. calling super class's function with subclass data type) with C, but encountered the aliasing issue.

In below, shape function is called with rectangle object.

In the case of me->super, the x and y are correct. However, they're wrong in that of (Shape*)me.

The reason I prefer (Shape*)me over me->super is that I want to hide struct implementation from clients.

Shouldn't C standard guarantee that? As per Section 6.7.2.1.13

“… A pointer to a structure object, suitably converted, points to its initial member. There may be unnamed padding within a structure object, but not at its beginning”.

/*shape.h*/
#ifndef SHAPE_H
#define SHAPE_H

typedef struct Shape Shape;

Shape* Shape_ctor(int x, int y);
int Shape_getX(Shape* me);
int Shape_getY(Shape* me);

#endif

/*shape.c*/
#include <stdlib.h>
#include "shape.h"

struct Shape
{
    int x;
    int y;
};

Shape* Shape_ctor(int x, int y)
{
    Shape* me = malloc(sizeof(struct Shape));
    me->x = x;
    me->y = y;

    return me;
}

int Shape_getX(Shape* me)
{
    return me->x;
}

int Shape_getY(Shape* me)
{
    return me->y;
}

/*rectangle.h*/
#ifndef RECT_H
#define RECT_H

#include "shape.h"

typedef struct Rectangle Rectangle;


Rectangle* Rectangle_ctor(int x, int y, unsigned int width, unsigned int height);

int Rectangle_getWidth(Rectangle* me);
int Rectangle_getHeight(Rectangle* me);

#endif

/*rectangle.c*/
#include <stdlib.h>
#include "rectangle.h"
#include "stdio.h"

struct Rectangle
{
    Shape* super;
    unsigned int width;
    unsigned int height;
};

Rectangle* Rectangle_ctor(int x, int y, unsigned int width, unsigned int height)
{
    Rectangle* me = malloc(sizeof(struct Rectangle));
    me->super = Shape_ctor(x, y);
    me->width = width;
    me->height = height;

    printf("x: %d\n", Shape_getX(me->super)); //correct value
    printf("y: %d\n", Shape_getY(me->super)); //correct value

    printf("x: %d\n", Shape_getX((Shape*)me)); // wrong value
    printf("y: %d\n", Shape_getY((Shape*)me)); // wrong value

    return me;
}

int Rectangle_getWidth(Rectangle* me)
{
    return me->width;
}

int Rectangle_getHeight(Rectangle* me)
{
    return me->height;
}

/*main.c*/
#include <stdio.h>
#include "rectangle.h"

int main(void) {

  Rectangle* r1 = Rectangle_ctor(0, 2, 10, 15);
  printf("r1: (x=%d, y=%d, width=%d, height=%d)\n", Shape_getX((Shape*)r1)
                                                  , Shape_getY((Shape*)r1)
                                                  , Rectangle_getWidth(r1)
                                                  , Rectangle_getHeight(r1));

  return 0;
}
3

There are 3 best solutions below

13
tstanisl On

You should place a base type as a first member. Not a pointer to the base type.

struct Rectangle {
    Shape super;
    ...
}

Moreover, you should redesign Shape_ctor. I suggest taking a pointer as a parameter and delegate the memory management to the caller.


Shape* Shape_ctor(Shape *me, int x, int y)
{
    me->x = x;
    me->y = y;

    return me;
}

The constructor of rectangle would be:

Rectangle* Rectangle_ctor(Rectangle *me, int x, int y, unsigned int width, unsigned int height)
{
    Shape_ctor(&me->super, x, y); // call base constructor
    me->width = width;
    me->height = height;

    printf("x: %d\n", Shape_getX(&me->super)); //correct value
    printf("y: %d\n", Shape_getY(&me->super)); //correct value

    return me;
}

Typical usage:

Rectangle rect;
Rectangle_ctor(&rect, ...);

or a bit more exotic variants like:

Rectangle* rect = malloc(sizeof *rect);
Rectangle_ctor(rect, ...);

// or
Rectangle* rect = Rectangle_ctor(malloc(sizeof *rect), ...);

// or even kind of automatic pointer
Rectangle* rect = Rectangle_ctor(&(Rectangle){0}, ...);

The cast would only be needed for implementation of virtual methods like Shape_getArea().

struct Shape {
  ...
  double (*getArea)(struct Shape*);
};

double Shape_getArea(Shape *me) {
  return me->getArea(me);
}
double Rectangle_getArea(Shape *base) {
  Rectangle *me = (Rectangle*)base; // the only cast
  return (double)me->width * me->height;
}

Rectangle* Rectangle_ctor(Rectangle *me, int x, int y, unsigned int width, unsigned int height) {
  ...
  me->super.getArea = Rectangle_getArea;
  ...
}

// usage:
Rectangle rect;
Rectangle_ctor(&rect, 0, 0, 3, 2);

Shape *shape = &rect.super;

Shape_getArea(shape); // should return 6

EDIT

In order to hide internals of Shape place a pointer to its private data in the structure. Initialize this pointer with relevant data in Shape_ctor.

struct Shape {
  void *private_data;
  // non private fields
};
3
MSalters On

The initial member of Rectangle is a Shape*, not a Shape. And the initial member of Shape is an int, not a Shape*`. The commonality you assume just isn't there.

If you look at how C++ implementations implement inheritance, in case of single inheritance they will place the base subobject at offset 0 inside the full object. Pointers to base class subobjects are used for virtual inheritance, but that rapidly gets complex.

0
supercat On

When C99 was written, there were conflicts between committee members who refused to have the Standard characterize as illegitimate some useful constructs that exploit the Common Initial Sequence guarantees, and those who refused to have it characterized as illegitimate optimizations that would break code that relied upon such constructs, but would be useful for code that didn't.

Such conflicts were resolved by writing ambiguous rules which both sides could interpret as saying what they wanted them to say. Since there was never any consensus about what the rules should mean, questions about what the rules really mean are inherently unanswerable.

Even going back to C89, the only way the rules really make sense is if one interprets the phrase "by an lvalue of a particular type" as applying to dereferenced pointers which are freshly visibly derived from something of appropriate type. Otherwise, given something like:

struct s { int x[2]; } foo;

an access to foo.x[1] would violate the type aliasing rules, since it is defined as equivalent to *(foo.x+1). The inner expression (foo.x+1) is a pointer of type int*, which has no relation to type struct s, and int is not among the types that may be used to access an object of type struct s. Any decent compiler should obviously recognize that the pointer is freshly visibly derived from an object of type struct s, however, and treat the access as though it were performed via that type. The question of when a pointer is "freshly-visibly derived" is a quality of implementation issue outside the Standard's jurisdiction, with the expectation that compilers would make a reasonable effort to notice such things whether or not the Standard mandated that they do so. Most conflicts revolve around the fact that some compiler maintainers that aren't interested in selling compilers are deliberately blind to any forms of derivation beyond those needed to avoid making structures completely useless.

So far as I can tell, all compilers not based upon clang and gcc will support common constructs exploiting the Common Initial Sequence rules, even when type-based aliasing analysis is enabled. Further, both clang nor gcc will sometimes perform erroneous "optimizations" even on some strictly conforming programs when their type-based aliasing analysis is enabled. As such, rather than trying to jump through hoops to be compatible with the broken optimization modes of clang and gcc, I'd recommend simply documenting that code requires that implemnentations, as a form of "conforming language extension", make a reasonable effort to process constructs involving Common Initial Sequence guarantees usefully.