I am trying to implement this emplace or merge
template<typename T>
T& EmplaceOrMerge(std::unordered_set<T>& s,
T&& t,
std::function<void(T&& a, T& b)> merge)
{
auto it = s.emplace(std::move(t));
T& u = const_cast<T&>(*it.first);
if (!it.second)
merge(std::move(t), u);
return u;
}
The merge function modifies its second argument in a way that preserves its hashed value. I am concerned with the use of std::move(t) in the merge case, because the emplace may have moved it already. I have read Microsoft's implementation of unordered_set and there is a very nice special case for that, if constexpr (_In_place_key_extractor::_Extractable). It recognizes that its argument std::move(t) can be hashed directly (and compared with operator== directly), without constructing another object T, and returns immediately the equivalent value in the unordered_set when there is one.
Does this special case occur in all implementations of the standard library ? If not I have undefined behaviour, and I wonder if there is another way to code this EmplaceOrMerge.
No, libstdc++ does not perform this optimization.
This program prints:
under libc++ (and, presumably, under MS-STL), but prints
under libstdc++.
Demo.
You can't get around the fact that libstdc++ will only call
Hashon an already constructed node. If you can't change the data structure (e.g. to astd::unordered_mapfrom an extracted key) one option would be to use the node-handle interface, which can avoid side effects on failed insertion. Using it may still pay the cost of a move and move-assign back, but hopefully that is relatively cheap:Demo.
And in your case, you can shortcut the move-assign back, so the overhead is only one move (and extra node allocation):