I'm writing a compute shader that will output some unknown amount (there is a theoretical upper bound, but it is huge, compared to expected values) of data into a storage buffer.
I have found a way to achieve this: First run a simplified version of the shader that just counts the number of output values. Then use the result from that to allocate a buffer that is large enough and run the real shader writing into this buffer.
But to me that feels inefficient, having a rather big overhead from recording and submitting two command buffers successively, both doing essentially the same, and retrieving the counter from the GPU in between. Is there a better way to do this? From what I have read, maybe Dynamic Storage Buffers could be a solution, but I can't find much information on how they work (an example would be nice) or what they are really intended to do.