std::transform and contiguous memory

182 Views Asked by At

I need a lambda that applies the (negative) discrete Laplace operator (matrix) to a contiguous memory container (vector) like std::array or std::vector

Discrete Laplace Operator

Can it be undefined behavior to write it using std::transform incrementing and decrementing pointers like this?

  auto A = [n,&h2](const auto & in, auto & out)
  {
    // First line of the matrix
    out[0] = (2.*in[0] - in[1])/h2 ;
    // Middle lines of the matrix
    std::transform(std::execution::par_unseq,
                   std::next(in.cbegin()),std::next(in.cend(),-1),std::next(out.begin()),
                   [&h2](const auto & val)
                   {
                     return (-*(std::next(&val,-1)) + 2.*val - *(std::next(&val)))/h2;
                   });
    // Final line of the matrix
    out[n-1] = (-in[n-2] + 2.*in[n-1])/h2;
  };

In other words, can algorithms like for_each and transform and other parallel algorithms break contiguous memory for some execution policy?

Edit 1: Note that I don't care which element of the vector in I am processing first, what I do care is if when I do *std::next(&val) inside the lambda in std::transform I do obtain the next element in the vector in and not something undefined.

Edit 2: I'm thinking of policies that would imply copying values somewhere else (for instance a SIMD register) and execute the lambda there before bringing back the result. Is there a condition in the standard on the execution policy or on the parallel algorithm that prevents that?

2

There are 2 best solutions below

9
Artyer On

No this is not safe [algorithms.parallel.exec]p3:

Unless otherwise stated, implementations may make arbitrary copies of elements (with type T) from sequences where is_trivially_copy_constructible_v<T> and is_trivially_destructible_v<T> are true.

The solution given in the notes is to wrap the iterators:

auto ptr_view = in | std::views::transform([](const auto& i) { return &i; });
std::transform(std::execution::par_unseq,
      std::next(ptr_view.begin()),std::prev(ptr_view.end()),std::next(out.begin()),
      [&h2](auto* ptr)
      {
        return (-ptr[-1] + 2.*ptr[0] - ptr[1])/h2;
      });
0
Jonno On

As an alternative to the solution proposed by @Artyer, I propose the following solution based on a custom iterator. The language lawyers may have a better way of defining the iterator and I will bow to their superior knowledge if that is the case.

The primary advantage of the custom iterator is that it facilitates the making of arbitrary copies of elements as foreseen by algorithms.parallel.exec, p3, and may as a consequence (depending on the hardware and compiler) execute faster on large vectors.

Additionally, I considered (-*(std::next(&val,-1)) + 2.*val - *(std::next(&val)))/h2 to be somewhat undesirable on account of it creating a kind of hidden coupling between the lambda expression in the call to std::transform and the structure of the container that is passed to A.

If nothing else, you could use the algorithm below as a testing control to verify that the solution proposed by @Artyer functions and performs as expected.

    auto A = [n,&h2](const auto & in, auto & out)
    {
        using TheirIter = typename std::remove_reference<decltype(in)>
            ::type::const_iterator;
        using ValueType = typename TheirIter::value_type;

        struct Slider {
            ValueType x_2, x_1, x_0;
        };

        class MyIter {
            TheirIter iter, end;
            Slider slider;
        public:
            typedef std::input_iterator_tag iterator_category;
            typedef Slider value_type;
            typedef std::ptrdiff_t difference_type;
            typedef const Slider* pointer;
            typedef const Slider& reference;

            explicit MyIter(TheirIter _iter, TheirIter _end) :
                iter{_iter}, end{_iter},
                slider {*std::next(_iter,-2), *std::next(_iter,-1), *_iter} {}
            MyIter& operator++() { ++iter;
                slider.x_2 = slider.x_1;
                slider.x_1 = slider.x_0;
                slider.x_0 = iter != end ? *iter : 0;
                return *this; }
            MyIter operator++(int) { MyIter retval = *this; ++(*this);
                return retval; }
            bool operator==(MyIter other) const { return iter == other.iter; }
            bool operator!=(MyIter other) const { return iter != other.iter; }
            reference operator* () const { return slider; }
        };

        // First line of the matrix
        out[0] = (2.*in[0] - in[1])/h2 ;
        // Middle lines of the matrix
        std::transform(std::execution::par_unseq,
            MyIter (std::next(in.cbegin(),2), in.cend()),
            MyIter (in.cend(), in.cend()),
            std::next(out.begin()),
            [&h2](const auto & val)
            {
                 return (-val.x_2 + 2*val.x_1 - val.x_0)/h2;
            });
        // Final line of the matrix
        out[n-1] = (-in[n-2] + 2.*in[n-1])/h2;
    };