Emplace or merge on std::unordered_set

514 Views Asked by At

I am trying to implement this emplace or merge

template<typename T>
T& EmplaceOrMerge(std::unordered_set<T>& s,
                  T&& t,
                  std::function<void(T&& a, T& b)> merge)
{
    auto it = s.emplace(std::move(t));
    T& u = const_cast<T&>(*it.first);
    if (!it.second)
        merge(std::move(t), u);
    return u;
}

The merge function modifies its second argument in a way that preserves its hashed value. I am concerned with the use of std::move(t) in the merge case, because the emplace may have moved it already. I have read Microsoft's implementation of unordered_set and there is a very nice special case for that, if constexpr (_In_place_key_extractor::_Extractable). It recognizes that its argument std::move(t) can be hashed directly (and compared with operator== directly), without constructing another object T, and returns immediately the equivalent value in the unordered_set when there is one.

Does this special case occur in all implementations of the standard library ? If not I have undefined behaviour, and I wonder if there is another way to code this EmplaceOrMerge.

2

There are 2 best solutions below

1
ecatmur On BEST ANSWER

No, libstdc++ does not perform this optimization.

struct A {
    A() = default;
    A(A&&) { std::format_to(std::ostreambuf_iterator<char>(std::cout), "A(A&&)\n"); }
    bool operator==(A const&) const = default;
};
template<> struct std::hash<A> { std::size_t operator()(A const&) const { return 0; } };
int main() {
    std::unordered_set<A> s;
    A a;
    std::format_to(std::ostreambuf_iterator<char>(std::cout), "{}\n",
        s.emplace(std::move(a)).second);
    std::format_to(std::ostreambuf_iterator<char>(std::cout), "{}\n",
        s.emplace(std::move(a)).second);
}

This program prints:

A(A&&)
true
false

under libc++ (and, presumably, under MS-STL), but prints

A(A&&)
true
A(A&&)
false

under libstdc++.

Demo.


I wonder if there is another way to code this EmplaceOrMerge.

You can't get around the fact that libstdc++ will only call Hash on an already constructed node. If you can't change the data structure (e.g. to a std::unordered_map from an extracted key) one option would be to use the node-handle interface, which can avoid side effects on failed insertion. Using it may still pay the cost of a move and move-assign back, but hopefully that is relatively cheap:

template<class T>
auto try_emplace(std::unordered_set<T>& s, std::type_identity_t<T>&& t) {
    std::unordered_set<T> s2;
    auto nh = s2.extract(s2.insert(std::move(t)).first);
    auto const ins = s.insert(std::move(nh));
    if (not ins.inserted)
        t = std::move(ins.node.value());
    return std::pair(ins.position, ins.inserted);
}

Demo.

And in your case, you can shortcut the move-assign back, so the overhead is only one move (and extra node allocation):

template<typename T>
T& EmplaceOrMerge(std::unordered_set<T>& s,
                  T&& t,
                  std::function<void(T&& a, T& b)> merge)
{
    std::unordered_set<T> s2;
    auto nh = s2.extract(s2.insert(std::move(t)).first);
    auto const ins = s.insert(std::move(nh));
    T& u = const_cast<T&>(*ins.position);
    if (not ins.inserted)
        merge(std::move(ins.node.value()), u);
    return u;
}
3
LoS On

std::unordered_set stores unique keys, so before inserting, it checks if the key is already present in the hash table. In case the key is already present, no insertion is made, and therefore the element is not constructed in-place. In cppreference, the emplace member function is described as 'insert a new element into the container constructed in-place with the given args if there is no element with the key in the container'. Below, it is also written 'the element may be constructed even if there already is an element with the key in the container, in which case the newly constructed element will be destroyed immediately'. So, this action could leave the element in an unspecified state, depending on implementation of its move constructor. The core problem is that you should not using emplace(Args&&...) because what I mentioned before and, in particular, it performs an in-place construction while you are simply moving T. You should pass the arguments to emplace using a perfect forwarding if you really want to construct in-place the element, and not using a moving.

Since @krisz commented me an interesting answer about the insert(value_type&&) member function, that can leave the element in an unspecified state (it does not ensure the move assignment operator or move constructor are called only if the key does not exist and then the insertion must be effectively performed), I advice you to change your code and check if the key already exists before inserting or moving it.