In my app, I have a transform hierarchy of objects. All the objects are instances of a single object, and there might be a large number of them. Any group in the hierarchy can have any number of instances or sub-groups.
Aside from the transforms, the instances can potentially be batched into a single draw call, and per-instance attributes can live in an attribute array with an instance stride.
I would like to specify the transforms for the objects in an efficient way, but I'm not sure the best approach. Here's what I've considered:
- One matrix per instance, in an attribute array with a divisor of 1.
- This seems wasteful. If I have (say) 1000 objects in a single group, then I'll specify the same matrix 1000 times. When the parent transform changes, I'll also have to go update 1000 attributes instead of 1.
- Break into multiple draw calls, one for each leaf group, and specify the transform as a uniform at the start of each call.
- This is less wasteful, but seems like it might be slower due to a larger number of draw calls. Groups with small numbers of instances might be inefficient due to poor batching.
- This isn't possible in OpenGL ES anyway (WebGL specifically), since I'd need to either (a) offset the starting instance for each call, which AFAICT requires a call to
glDrawElementsInstancedBaseInstance(), or (b) build entirely separate attribute arrays for each group in the hierarchy, which also has more overhead + complexity.
- Use a uniform array of transforms, and specify a per-instance index into it
- This seems untenable because it looks like the maximum size of uniform arrays is very small.
- Use a uniform buffer object to hold an array of transforms, and specify a per-instance index into the array
- This seems better, though it still looks like the maximum array size isn't very large
- Use an SSBO to do the above
- Looks like this isn't available in OpenGL ES 3.0.
- Something else?
What would be considered best practice for this task? Is there perhaps a method I haven't considered?
If you are repeating the same matrix you already have problems. Ultimately you need a unique matrix per instance because you need each instance to be transformed to a unique location on screen.
Yes, so? Matrices are small and very cheap to update.
It's a lot more efficient to update 1 matrix per instance on the CPU than perform a redundant tree of matrix fetches and calculations per vertex on the GPU.
If possible I'd try to avoid attributes and instance divisors for per instance data - this forces per-vertex refetches on a lot of GPUs because the fact that it's a per instance divisor isn't known at shader compile time.
If possible store what you need in an instance-count sized array in a uniform buffer or storage buffer, and index into that array using
gl_InstanceIDto fetch the matrix you need. UBO limits can be a problem, so if you need a lot of objects, then you'll need to partition the draws in chunks. (That said, benchmark this, it's going to be vendor dependent).