I have a vertex shader output structure which is then used as the input of the fragment shader in Metal:

struct VertexOut {
    float4 position [[position]];
    float2 texture_coordinates;
    half3 texture_color;
    ushort texture_index;
};
vertex VertexOut vertex_function(/* ... */) {
    // ...
}
fragment half4 fragment_function(const VertexOut vertex_in [[stage_in]], const texture2d_array<half> texture [[texture(0)]]) {
    constexpr sampler texture_sampler(mip_filter::linear, address::repeat);
    const half4 color = t.sample(texture_sampler, vertex_in.texture_coordinates, vertex_in.texture_index);
    return half4(color.rgb*vertex_in.texture_color, color.a);
}

However, I noticed that float4 has 16-byte alignment, and the VertexOut structure is only 2 bytes away from only needing to be 2*sizeof(float4) in size. However, due to the padding bytes in half3, the structure ends up being 16 entire bytes bigger than it needs to be, 2 from the actual padding bytes in the half3 and the other 14 from trailing padding to make the structure a multiple of the size of the largest member.

I assume that these extra 16 bytes per interpolated pixel will add up to significant memory usage.

To get around this I tried using packed_half3 texture_color; instead. But this would not compile because packed_* vectors, matrices, arrays, and other types are not allowed to be in the vertex output/fragment input.

I assume there are technical reasons, that I do not know of, for this restriction.

But I could of course get around this restriction by using 3 individual half values:

struct VertexOut {
    float4 position [[position]];
    float2 texture_coordinates;
    half color_r, color_g, color_b;
    ushort texture_index;
};

And then I could pack these individual values into a half3 inside the fragment shader.

That brings us to my question(s):

  • What is actually going on behind the scenes preventing using packed vectors here? I assume this is not just some arbitrary restriction. Why would I be able to pass multiple individual values but not an array or packed vector? (I am not looking for opinion-based speculation or rationale. Is there actually some technical reason for this?).
  • Are the fragment data structures somehow represented in a way that prevents them from having non-properly sized or aligned types? The only logical reason I could imagine was if that rather than being stored as an buffer of structures, there is a separate buffer for each field and they are then combined behind the scenes via the [[stage_in]] qualifier. Is this wild guess of how this works internally correct or close to correct? I do not know where to look for documentation on how Metal handles the vertex structures internally.
  • If the restrictions can be so easily bypassed by passing individual values, would those reasons (whatever they may be) still apply? Is there a catch or performance penalty to passing individual values to get around the restriction preventing packed vectors, arrays, matrices, etc from being used here? And would doing so provide any of the memory saving benefits I though it would?
0

There are 0 best solutions below