I am working on code which I want to vectorize with C++. I am using AVX2 with doubles so vector length=4,
I am having difficulties to determine what is the optimal thing to do in my case where I have a memory aligned vector b that I want to add a another vector a, no problem.
Then I want to add a vector c to a where the starting point of c is 8 bytes to left in memory compared to the starting point of b.
I have been thinking about shifting b one to right, masking the first value of b to zero, adding b to a and finally loading the one scalar double left to the starting address of b and adding that to the to first place in a.
Could it do also that I load the vector left of b, blend and shuffle to get c and add that to c. But from what I have read shuffle instructions seem expensive so I have been wondering is there a better way to get/add a vector that is n elements shifted of a vector that I have already loaded to register.