C++ strict aliasing and UB

219 Views Asked by At

I'm reviewing some code (can't post all of it), but there's a function like this:

template <typename DestType, typename SourceType>
inline void transferDataAndUpdateSpan(MyArray<DestType>& to, MySpan<const SourceType>& source)
{
    static_assert(sizeof(DestType) == sizeof(SourceType), "Data size mismatch!!");
    to.resize(source.size());
    memcpy(to.data(), source.data(), sizeof(SourceType) * source.size());
    source = { (SourceType*)to.data(), to.size() };
}

MySpan is basically a typedef for std::span and MyArray is a container which has a constructor which receives a pointer to the data and the data size.

Question: isn't source = { (SourceType*)to.data(), to.size() }; breaking strict aliasing here? Is this triggering UB?

1

There are 1 best solutions below

2
user17732522 On

First of all, the memcpy itself has undefined behavior if DestType (and SourceType?) aren't trivially-copyable or if the object representations in the SourceType objects aren't valid object representations for values of DestType.

A safer way to transferring the object representations would be to assign the result of std::bit_cast from the source element to the target element in a loop. It would at least verify trivial-copyability and would also include the size check you do manually at the moment.

Then, you say "MyArray is a container which has a constructor which receives a pointer to the data and the data size": But the constructor isn't used anywhere. You are just copying object representations. So hopefully to.data() is actually a pointer into an array of DestType objects into which memcpy can copy the object representations.

Then, (SourceType*)to.data() is a C-style cast, which are discouraged for a reason, especially in generic code like this: Depending on the types SourceType and DestType this can have completely different meaning.

If SourceType is for example a base class of DestType, then the cast will be a static_cast and the result will be a pointer to the SourceType base class subobject. This does in general change the address of the pointer and may fail to compile if the base class is inaccessible in the context (i.e. a private base class). Accessing the resulting pointer is generally fine, however doing pointer arithmetic on it (as your span probably is) would be UB, because the array into which the pointer points is an array of DestType objects, not SourceType objects. It is UB to do pointer arithmetic with a base class type into a derived class array.

If SourceType is a derived class of DestType, then the cast itself will still be a static_cast, but will have undefined behavior, because it would try to downcast a DestType object to its derived SourceType object which doesn't exist. An exception to this applies if implicit object creation applies as detailed below.

If there is no such base class relation, then the cast will end up as a reinterpret_cast, which does generally not change the address.

Generally, a reinterpret_cast also doesn't change to which object the pointer points, e.g. with SourceType = float and DestType = int, (SourceType*)to.data() will point to a DestType object, not a SourceType object. In such a situation the aliasing rule as well as rules for e.g. member access expressions apply and will make almost any use of the resulting pointer UB with very few exceptions.

An exception where a reinterpret_cast does change the pointer value to point to a different object is if there is an object of the target type that is pointer-interconvertible with the original object. That applies for example to first non-static data members of standard-layout classes without base classes. In that case aliasing can't be an issue because the resulting pointer will point to the actual subobject matching the type of the pointer. However, pointer arithmetic will still be faced with the exact same UB problem as stated above for static_cast.

Furthermore a reinterpret_cast can result in an unspecified value (which then causes UB on access or use in most expressions) if the alignment of the address isn't sufficient for the SourceType, which can happen if e.g. SourceType has a stricter alignment requirement than DestType.

In either case however, the cast itself will not cause UB (with the mentioned exception).


Additional note: If SourceType is an implicit-lifetime type memcpy could implicitly create SourceType objects in the to.data() storage, ending the lifetime of the previous DestType objects (and potentially the whole MyArray object). In that case there would (assuming the alignment is not a problem as stated above) be no issue with the cast to SourceType (except for a missing std::launder). However accessing the to elements as DestType later, or potentially using to at all, would then cause UB for accessing out-of-lifetime objects (or cause UB when doing pointer arithmetic in the case that DestType is a base of SourceType). That's probably not the intended use case.


To simplify all of this: You can't have a memory region be used as SourceType and DestType at the same time if they are different in more than cv-qualifications, with very few exceptions and practically no exception if you also want to do pointer arithmetic on the memory region in both types at the same time.


All of the above is based strictly on what the standard says is or isn't UB. Practically speaking for example I do not expect any compiler to behave unexpected when doing pointer arithmetic in the wrong type as long as the types have the same size and alignment requirement. However violations of the aliasing rule will cause problems in practice.