I'm working with the DirectXMath (or XNAMath) library (defined in the DirectXMath.h header of the Windows SDK), as it appears to be really performant and offers everything that is needed for physics and rendering. However I found it to be quite verbose (Using XMStoreFloatX and XMLoadFloatX everywhere is tiring).
I am trying to make it a little easier to operate and came up with the idea to hide the Stores/Loads in assignment operators/conversion operators. As both of these are required to be member functions I came up with this code as an example:
struct Vector2F : public DirectX::XMFLOAT2 {
inline Vector2F() : DirectX::XMFLOAT2() {};
inline Vector2F(float x, float y) : DirectX::XMFLOAT2(x, y) {};
inline Vector2F(float const * pArray) : DirectX::XMFLOAT2(pArray) {};
inline Vector2F(DirectX::XMVECTOR vector) {
DirectX::XMStoreFloat2(this, vector);
}
inline Vector2F& __vectorcall operator= (DirectX::XMVECTOR vector) {
DirectX::XMStoreFloat2(this, vector);
return *this;
}
inline __vectorcall operator DirectX::XMVECTOR() {
return DirectX::XMLoadFloat2(this);
}
};
As you can see it replicates the public interface of XMFLOAT2 and adds a constructor, an assignment operator and a conversion for XMVECTOR, which is the SIMD type DirectXMath uses for calculations. I intend to do this for every storage struct DirectXMath offers.
Perfomance is a really important factor for a math libary, thus my question is: What are the perfomance implications of such an inheritance? Is there any additional code generated (of course assuming full optimization) compared to normal usage of the library?
Intuitively I would say that the generated code should be exactly the same as when I'm using the verbose variant without these convenience operators, as I am essentially just renaming structs and functions. But maybe there are aspects I don't know about?
P.S. I'm a little concerned about the return type of the assignment operator, as it adds additional code. Would it be a good idea to omit the reference returning to optimize it?
If you find that DirectXMath is a little too verbose for your tastes, take a look at SimpleMath in the DirectX Tool Kit. In particular, the
Vector2class:The main reason that DirectXMath is so verbose in the first place is to make it very clear to the programmer when 'spilling to memory' as this tends to negatively impact the performance of SIMD code. When I moved from XNAMath to DirectXMath, I had considered adding something like the implicit conversions I used for "SimpleMath", but I wanted to make sure that any such "C++ magic" was opt-in and never a surprise for a performance-sensitive developer. SimpleMath also acts a bit like training wheels making it easier to port existing code that is not alignment-aware and morph it into something more SIMD-friendly over time.
The real performance issue with SimpleMath (and your wrapper) is that each function implementation has to do an explicit Load & Store around what is otherwise a fairly small amount of SIMD. Ideally in optimized code it would all get merged away, but in debug code they are always there. For any real performance benefit from SIMD, you want to have long runs of in-register SIMD operations between each Load & Store pair.
Another implication is that parameter passing a wrapper like
Vector2or yourVector2Fwill never be particular efficient. The whole reason thatXMVECTORis a typedef for__m128rather than a struct, and the existence ofFXMVECTOR,GXMVECTOR,HXMVECTOR, andCXMVECTORis to try to optimize all the possible calling convention scenarios and in the best case get in-register passing behavior (if things don't inline). See MSDN. Really the best you can do withVector2is to consistently pass itconst&to minimize temporaries and stack copies.