Can the comparison of multiple contiguous unsigned integers be optimized in a safe manner?

53 Views Asked by At

In GCC 13.2 and Clang 18.0.1 using -O3, the generated assembly for operator < of a tuple of four std::uint16_t objects is less than impressive (pun intended). Two other implementations using reinterpret_cast and type punning both produce optimized assembly.

EDIT: Since this question is already getting downvotes, I'd like to point out that the default assembly uses 3 conditionals, which is quite impactful when this code is being run in a high-performance loop. Both other implementations have zero branches.

Question 1: Are these implementations correct?

Question 2: If not, what conditions are required to guarantee correctness?

Question 3: Assuming there is a test for correctness that can be evaluated at compile time, why doesn't the compiler simply output the optimized assembly?

#include <cstdint>
#include <tuple>
#include <bit>

using TupleU16 = std::tuple<std::uint16_t, std::uint16_t, std::uint16_t, std::uint16_t>;

bool less(TupleU16 lhs, TupleU16 rhs) {
    return lhs < rhs;
}

constexpr bool canSafelyUseTypePunning() {
    TupleU16 t;
    std::uint16_t* val0 = &std::get<0>(t);
    std::uint16_t* val1 = &std::get<1>(t);
    std::uint16_t* val2 = &std::get<2>(t);
    std::uint16_t* val3 = &std::get<3>(t);

    if constexpr (std::endian::native == std::endian::little) {
        return sizeof(TupleU16) == 8 &&
            (val0 - val3) == 3 && (val0 - val2) == 2 && (val0 - val1) == 1;
    }
    else if constexpr (std::endian::native == std::endian::big) {
        return sizeof(TupleU16) == 8 &&
            (val3 - val0) == 3 && (val2 - val0) == 2 && (val1 - val0) == 1;
    }
    else {
        return false;
    }
}

bool less2(TupleU16 lhs, TupleU16 rhs) { // cannot be constexpr
    static_assert(canSafelyUseTypePunning());
    auto* lhsp = reinterpret_cast<const std::uint64_t*>(&lhs);
    auto* rhsp = reinterpret_cast<const std::uint64_t*>(&rhs);
    return *lhsp < *rhsp;
}

bool less3(TupleU16 lhs, TupleU16 rhs) { // can be constexpr
    static_assert(canSafelyUseTypePunning());
    union U {
        std::uint64_t u64;
        TupleU16 tu16;
    };
    return U{.tu16 = lhs}.u64 < U{.tu16 = rhs}.u64;
}
1

There are 1 best solutions below

2
user17732522 On

No, that's

  1. an aliasing violation causing undefined behavior, because you access the std::tuple object through a pointer of a different type and many compilers actually do make use of the aliasing rule for optimization (although there are often options to disable these optimizations)
  2. implementation-dependent, because there is no guarantee about std::tuples memory layout at all and there are in fact std::tuple implementations which order elements in different orders. They are not even guaranteed to be contiguous in memory.

The reason that std::tuple's operator< is suboptimal for your use case is probably that it guarantees that the lexicographically next element is only accessed if the previous element's comparison hasn't been conclusive yet. That means that it gives extra guarantees in the case that another thread modifies other elements of the tuple.

It seems that you don't need that guarantee and would prefer a more performant implementation. If you need that, implement a std::tuple-equivalent with these requirements and layout assumptions you need in mind yourself and for the comparison use std::bit_cast<uint64_t> instead of reinterpret_cast to get defined behavior (or equivalently std::memcpy to a local uint64_t variable). Also, if all element types of the tuple are equal, then std::array would be enough (but it has the same operator< performance issue).