What is happening under the hood of virtual inheritance?

445 Views Asked by At

Recently I have been trying to make a plugin for an old game, and running into a problem similar to Diamond Inheritance.

I have a very reduced example, write as follows:

#include <iostream>
#include <stdint.h>
#include <stddef.h>

using namespace std;

struct CBaseEntity
{
    virtual void Spawn() = 0;
    virtual void Think() = 0;

    int64_t m_ivar{};
};

struct CBaseWeapon : virtual public CBaseEntity
{
    virtual void ItemPostFrame() = 0;

    double m_flvar{};
};

struct Prefab : virtual public CBaseEntity
{
    void Spawn() override { cout << "Prefab::Spawn\n"; }
    void Think() override { cout << "Prefab::Think\n"; }
};

struct WeaponPrefab : virtual public CBaseWeapon, virtual public Prefab
{
    void Spawn() override { cout << boolalpha << m_ivar << '\n'; }
    void ItemPostFrame() override { m_flvar += 1; cout << m_flvar << '\n'; }

    char words[8];
};

int main() noexcept
{
    cout << sizeof(CBaseEntity) << '\n';
    cout << sizeof(CBaseWeapon) << '\n';
    cout << sizeof(Prefab) << '\n';
    cout << sizeof(WeaponPrefab) << '\n';

    cout << offsetof(WeaponPrefab, words) << '\n';
}

The first two are extracted from the game's source code and I made them pure virtual classes since I have no need to instantiate them. The third class (Prefab) is the one I extend all my classes in my mod from.

The problem is: I just noticed that the class size changed, which could potentially indicate an ABI-breaking thingy waiting for me. When I removed all virtual keywords from inheritances, the class size is quite small, and the memory layout make sense to me. But whenever I put virtual inheritance on, the size suddenly blows up, and the layout seems like a mystery.

Like I printed out the offsetof a variable in my WeaponPrefab class, it shows 8, but the total size is 48, which doesn't make any sense - where are the m_ivar and m_flvar?

(I am not trying to provoke people with undefined behavior, but just trying to cope with the existing ABI in the original game.)

Link to Compiler Explorer: https://godbolt.org/z/YvWTbf8j8

3

There are 3 best solutions below

15
Miles Budnek On BEST ANSWER

Warning: this is all implementation-detail. Different compilers may implement the specifics differently, or may use different mechanisms all together. This is just how GCC does it in this specific situation.

Note that I'm ignoring the vtable pointers used to implement virtual method dispatch throughout this answer to focus on how virtual inheritance is implemented.

Using normal, non-virtual inheritance, a WeaponPrefab would include two CBaseEntity sub-objects: one that it inherits via CBaseWeapon and one that it inherits via Prefab. It would look something like this:

 WeaponPrefab
 ┌─────────────────────┐
 │ CBaseWeapon         │
 │ ┌─────────────────┐ │
 │ │ CBaseEntity     │ │
 │ │ ┌─────────────┐ │ │
 │ │ │ int64_t     │ │ │
 │ │ │ ┌─────────┐ │ │ │
 │ │ │ │ m_ivar  │ │ │ │
 │ │ │ └─────────┘ │ │ │
 │ │ └─────────────┘ │ │
 │ │  double         │ │
 │ │  ┌─────────┐    │ │
 │ │  │ m_flvar │    │ │
 │ │  └─────────┘    │ │
 │ └─────────────────┘ │
 │ Prefab              │
 │ ┌─────────────────┐ │
 │ │ CBaseEntity     │ │
 │ │ ┌─────────────┐ │ │
 │ │ │ int64_t     │ │ │
 │ │ │ ┌─────────┐ │ │ │
 │ │ │ │ m_ivar  │ │ │ │
 │ │ │ └─────────┘ │ │ │
 │ │ └─────────────┘ │ │
 │ └─────────────────┘ │
 │  char[8]            │
 │  ┌─────────┐        │
 │  │ words   │        │
 │  └─────────┘        │
 └─────────────────────┘

virtual inheritance allows you to avoid this. Each object will have only one sub-object of each type that it inherits from virtually. In this case, the two CBaseObjects are combined into one:

WeaponPrefab
┌───────────────────┐
│   char[8]         │
│   ┌─────────┐     │
│   │ words   │     │
│   └─────────┘     │
│ Prefab            │
│ ┌───────────────┐ │
│ └───────────────┘ │
│ CBaseWeapon       │
│ ┌───────────────┐ │
│ │  double       │ │
│ │  ┌─────────┐  │ │
│ │  │ m_flvar │  │ │
│ │  └─────────┘  │ │
│ └───────────────┘ │
│ CBaseEntity       │
│ ┌───────────────┐ │
│ │  int64_t      │ │
│ │  ┌─────────┐  │ │
│ │  │ m_ivar  │  │ │
│ │  └─────────┘  │ │
│ └───────────────┘ │
└───────────────────┘

This presents a problem though. Notice that in the non-virtual example CBaseEntity::m_ivar is always 0-bytes into a Prefab object, whether it's standalone or a sub-object of a WeaponPrefab. But in the virtual example the offset is different. For a standalone Prefab object CBaseEntity::m_ivar would be offset 0-bytes from the start of the object, but for a Prefab that's a sub-object of a WeaponPrefab it would be offset 8-bytes from the start of the Prefab object.

To get around this problem, objects generally carry an extra pointer to a static table generated by the compiler that contains offsets to each of their virtual base classes:

                              Offset Table for
WeaponPrefab                  standalone WeaponPrefab
┌────────────────────┐        ┌──────────────────────┐
│   Offset Table Ptr │        │Prefab offset:      16│
│   ┌─────────┐      │        │CBaseWeapon offset: 24│
│   │         ├──────┼───────►│CBaseEntity offset: 40│
│   └─────────┘      │        └──────────────────────┘
│   char[8]          │
│   ┌─────────┐      │
│   │ words   │      │
│   └─────────┘      │
│ Prefab             │        Offset Table for
│ ┌────────────────┐ │        Prefab in WeaponPrefab
│ │Offset Table Ptr│ │        ┌──────────────────────┐
│ │  ┌─────────┐   │ │        │CBaseEntity offset: 24│
│ │  │         ├───┼─┼───────►│                      │
│ │  └─────────┘   │ │        └──────────────────────┘
│ └────────────────┘ │
│ CBaseWeapon        │        Offset Table for
│ ┌────────────────┐ │        CBaseWeapon in WeaponPrefab
│ │Offset Table Ptr│ │        ┌──────────────────────┐
│ │  ┌─────────┐   │ │        │CBaseEntity offset: 16│
│ │  │         ├───┼─┼───────►│                      │
│ │  └─────────┘   │ │        └──────────────────────┘
│ │  double        │ │
│ │  ┌─────────┐   │ │
│ │  │ m_flvar │   │ │
│ │  └─────────┘   │ │
│ └────────────────┘ │
│ CBaseEntity        │
│ ┌────────────────┐ │
│ │  int64_t       │ │
│ │  ┌─────────┐   │ │
│ │  │ m_ivar  │   │ │
│ │  └─────────┘   │ │
│ └────────────────┘ │
└────────────────────┘

Note that this isn't precisely accurate. Since Prefab has no data members, GCC actually avoids giving it its own offset table and instead has it share WeaponPrefab's table and pointer. This diagram is how GCC would lay the object out if Prefab did have at least one data member.

5
ALX23z On

I've run the code and the answers to sizes of Classes' sizes were

sizeof(CBaseEntity) = 16 
sizeof(CBaseWeapon) = 32
sizeof(Prefab) = 24
sizeof(WeaponPrefab) = 48

Generally speaking, implementation of virtual functions and virtual inheritance are implementation defined and can vary depending on compiler and other options. That's being said, perhaps I can provide some explanations over the sizes of the objects, at least for a possible implementation.

CBaseEntity is simply a polymorphic type and thus has a pointer towards vtable (true for all implementations of C++ I am aware of, but not mandated by standard), it also contains int64. Size of pointer = 8, size of int64 = 8, so in total it is exactly 16.

CBaseWeapon inherits from CBaseEntity and holds a double. It already has to be at least of size 24. Now virtual inheritance means that the difference between location of objects of CBaseWeapon and CBaseEntity is not fixed - only final class determines it. This information needs to be stored inside the class' instance. I believe the info is located somewhere in the beginning of CBaseWeapon's layout. And to contain this info, one ought to add padding so size is divisible by 8 due to alignment requirements. Thus, the total size sums up to 32. Basically, it adds 16 on top of CBaseEntity

Prefab similarly to CBaseWeapon, but it doesn't hold double. So 24 or 8 on top of CBaseEntity.

WeaponPrefab inherits virtually from CBaseEntity, CBaseWeapon, Prefab, and contains char[8]. So, it already needs 16+16+8+8 = 48. If anything, it is surprising that WeaponPrefab isn't bigger. It is likely because Prefab doesn't store any objects and the two classes somehow share the layout-location variable optimizing the storage size of the class. Say, if you add a double member to Prefab, the size of WeaponPrefab will increase to 64.

But, as I previously said, it depends a lot on exact specification. I don't know for which platform you code. I am sure the ABI's specification is somewhere on the internet and you can look up the details. For instance, check out the Itanium C++ ABI which may or may not be relevant to you.

Edit: as was analyzed by @MilesBudnek the "layout-location variable" is actually pointer to compiler generated offset-table. So it takes 8 bytes in or whatever the platform dictates.

0
curiousguy On

Unlike the rules for layout of a vptr (short for vtable pointer) in each instance, and vtable, for SI (single inheritance) which are pretty simple (1), the rules here are quite complicated and while I'm willing to discuss these in extreme details, I assume it can be slightly boring - and totally irrelevant if the request is simply: can I keep the same layout (and ABI) with virtual as with non virtual inheritance.

The rules are more complicated for virtual inheritance than for simple, non virtual inheritance, because virtual means basically the same thing (2) for member functions and for base classes: it adds a level of flexibility, a potential for changing behavior, as allowed by "overriding" (3) the assertion (4) in derived classes.

But virtual inheritance overrides base class inheritance so it affects data layout, unlike virtual function overriding!

That flexibility is implemented, as always, by adding a level of indirection. It can be done by either:

  • an internal pointer for another object fragment inside a base class subobject;
  • an offset representing the relative position of that fragment;
  • or a vptr to a vtable with all that base class subobject (subobject of another given class) dependent, instance independent information: the information depends on which specific base class subobject (as specified by a complete inheritance path (5) from the most derived object) we are in, but not a specific instance (all specific bases of a complete object have the same relative positions).

That latest choice is the most space efficient: at most one pointer by base class subobject, often less (issue too complex to discuss when exactly another vptr will be introduced, unless you want me to give details (6)).

NOTES

(1) Even trivial if you exclude issues such as destructor calls vs. delete operator, the typeid operator, and the power of dynamic_cast (to a class pointer or a void pointer).

(2) I know many authors explain that the virtual keyword is overloaded for two totally unrelated purposes here, but I disagree.

(3) The word overriding isn't normally used for virtual bases: we don't usually define virtual inheritance as an overriding of another inheritance, but saying so fits in the analogy with virtual functions overriding.

(4) Because base class inheritance isn't "declared" in a derived class, as : public Base is not a declaration, I use "assertion" here; the syntax : public Base "asserts" that Base is a public base, just like virtual void foo(); "asserts" that foo() is a virtual member function.

(5) A complete inheritance path mentions every single direct base needed to reach a specific indirect base, like MostDerived::Derived2::Derived1::Base. Only with such path you can unambiguously designate a base class subobject in all cases. (7)

Derived-to-base pointer conversions are only specified in term of allowable complete path: a pointer can be converted to a base type only if one such path can be found (and with virtual inheritance, many such paths can exists which are equivalent).

(6) I can add a lot of fine details if you wish; it's a promise - or a threat if you fear fine ABI details...

(7) It's worth noting that early versions of C++ standards had no serious explanation of such paths, which is a fundamental C++ inheritance concept, especially when dealing with multiple inheritance.

It was left completely implicit, to be inferred by the intuitive reader: decoding the standard is an exercice of critical thoughts and intuition. If you stick to black and white letters and rules, you will often miss the whole idea and get stuff wrong! But that intuitive reading can only work if educated by serious C++ skills.