Several sources, e.g., Why is it not good to use recursive inheritance for std::tuple implementations?, point out the compile-time cost of using recursive template implementation for tuples. I'm considering writing a system that will use 100s of tuples which might each be of size 1-100 (median around 5 to 10 I would guess), and wondering whether I should care that the standard libraries I will be using have the recursive implementation? Can anyone quantify this -- e.g. impact on build time is of order of 25%, or memory usage is of order of 50% more? I've been bitten badly before with other template instantiation issues, hence my concern.
My main interest is creating tuples, and then using iterating them using std::get, likely using code on the following lines:
// for_each_tuple: from https://www.cppstories.com/2022/tuple-iteration-apply/
template <typename TupleT, typename Fn>
void for_each_tuple(TupleT&& tp, Fn&& fn)
{
std::apply([&fn]<typename... T>(T&&... args) { (fn(std::forward<T>(args)), ...); },
std::forward<TupleT>(tp));
}
std::get is clearly non-recursive, hence O(1) for the compiler, for libstdc++, but the code for msvc and gcc isn't as clear; I think msvc is recursive, O(n), but I can't figure out the gcc code. Are they both recursive?
msvc has std::get
template <size_t _Index, class... _Types>
constexpr tuple_element_t<_Index, tuple<_Types...>>& get(tuple<_Types...>& _Tuple) noexcept {
using _Ttype = typename tuple_element<_Index, tuple<_Types...>>::_Ttype;
return static_cast<_Ttype&>(_Tuple)._Myfirst._Val;
}
then (in utility)
template <size_t _Index, class _This, class... _Rest>
struct tuple_element<_Index, tuple<_This, _Rest...>>
: tuple_element<_Index - 1, tuple<_Rest...>> {}; // recursive tuple_element definition
template<size_t __i, typename... _Elements>
constexpr __tuple_element_t<__i, tuple<_Elements...>>&
get(tuple<_Elements...>& __t) noexcept
{ return std::__get_helper<__i>(__t); }
template<size_t __i, typename _Head, typename... _Tail>
constexpr _Head&
__get_helper(_Tuple_impl<__i, _Head, _Tail...>& __t) noexcept
{ return _Tuple_impl<__i, _Head, _Tail...>::_M_head(__t); }
template <size_t _Ip, class ..._Tp>
typename tuple_element<_Ip, tuple<_Tp...> >::type&
get(tuple<_Tp...>& __t) _NOEXCEPT
{
typedef typename tuple_element<_Ip, tuple<_Tp...> >::type type;
return static_cast<__tuple_leaf<_Ip, type>&>(__t.__base_).get();
}
For construction, the two standard libraries I am using, msvc and gcc, both use a recursive approach.
msvc
template <class _This, class... _Rest>
class tuple<_This, _Rest...> : private tuple<_Rest...> { // recursive tuple definition
public:
using _This_type = _This;
using _Mybase = tuple<_Rest...>;
gcc
/**
* Recursive tuple implementation. Here we store the @c Head element
* and derive from a @c Tuple_impl containing the remaining elements
* (which contains the @c Tail).
*/
template<size_t _Idx, typename _Head, typename... _Tail>
struct _Tuple_impl<_Idx, _Head, _Tail...>
: public _Tuple_impl<_Idx + 1, _Tail...>,
private _Head_base<_Idx, _Head>
{
template<size_t, typename...> friend struct _Tuple_impl;
c.f. clang
template <class ..._Tp>
class tuple
{
typedef __tuple_impl<typename __make_tuple_indices<sizeof...(_Tp)>::type, _Tp...> _BaseT;
_BaseT __base_;
In summary, if I've read the standard library code right, it seems like at least one of msvc and gcc std::get is recursive hence O(n) for the compiler and O(n^2) for the for_each code I lifted from cpp stories. Is it possible to get the iteration down to O(n) for the compiler? And even then is there still a significant time/space cost for the compiler due to the recursive nature of tuple construction on msvc and gcc?
(I'm using msvc 2022 17.6, gcc 13.2, clang 16, and have deleted some boilerplate in the excerpts from the standard libraries, to aid readability.)
I ended up doing the work to measure performance for different tuple sizes and different implementations (msvc, libc++ libstdc++, boost hana tuple) -- what I was trying to avoid by asking the question :).
Main conclusions from the performance data I collected.
std::tuple, whereas withhana::tuple, msvc compile times were pretty much instantaneous. Even compiling a 3-std::tupleI saw a compile time of about .8s compared to a flat 0.05s forhana::tupleregardless of size.std::tupledoes indeed scale very well. However, the compiler appears slower, hence these gains are only apparent for large tuples with size over 100.Hana's claim
holds out for very large tuples, and on msvc regardless of tuple size. I wonder how actively it's maintained given advances in C++ since C++14, and whether further optimizations might now be possible.
Results below. Note the performance measurements were on a slower hardware for linux (gcc and clang). Memory usage is very approximate.
Larger image including data: Time/s to compile tuple construction / get with different compilers and tuple implementations.
Test code
To bring in hana in CMake,
FetchContent_Declare(hana SYSTEM GIT_REPOSITORY https://github.com/boostorg/hana.git GIT_TAG ${gitTag}),FetchContent_MakeAvailable(hana), andtarget_link_libraries(mylib hana). For msvc I needed to delete the lineconstexpr auto is_literal_type = detail::hana_trait<std::is_literal_type>{}, as msvc has followed the C++20 standard and removedstd::is_literal_type(cppreference).