Using neon/simd to optimize Vector3 class

335 Views Asked by At

I'd like to know if it is worth it optimizing my Vector3 class' operations with neon/simd like I did to my Vector2 class.

As far as I know, simd can only handle two or four floats at the same time, so to my Vector3 we would need something like this:

Vector3 Vector3::operator * (const Vector3& v) const
{
    #if defined(__ARM_NEON__)
        // extra step: allocate a fourth float
        const float v4A[4] = {x, y, z, 0};
        const float v4B[4] = {v.x, v.y, v.z, 0};

        float32x4_t r = vmul_f32(*(float32x4_t*)v4A, *(float32x4_t*)v4B);
        return *(Vector3*)&r;
    #else
        return Vector3(x * v.x, y * v.y, z * v.z);
    #endif
}

Is this safe? Would this extra step still be faster than a non-simd code on most scenarios (say arm64 for instance)?

0

There are 0 best solutions below