I'm working on a heightmap erosion compute shader in unity, where each point on the map is eroded separately. This is working well for small maps, but the project I'm working on requires 4096x4096 maps. This means 4096^2 = 16777216 points to simulate. With the default thread dimensions of [64,1,1], this creates 262144 thread groups, way more than the allowed limit of 65535.
My question is:
Can I simply raise the thread dimensions, and what do I have to consider in terms of performance when I do?
Is it maybe possible to simply run the shader multiple times, with different ranges of heightmap coordinates?
This is my first time working with shaders. The tutorials I've seen online quickly go too in depth into gpu hardware specifications, so I didn't pick up much from that.
With
64x64threads per work group, you canDispatch64x64work groups to do what you need : remember that64x64threads will be invoked for each work group you dispatch, so you will have64x64 work groups x 64x64 threads=4096 workgroups x 4096 threadsexecuted.As for the performance implication, the general answer is "try it out !" : run your kernel with different sizes for threads and work groups. The results may vary depending on your computations and on your hardware.
But, in case you need to bypass the
65535limit, you can useDispatchIndirect. Basically, it's the same asDispatchbut the arguments are passed through a ComputeBuffer.Ps : working on a GPU requires understanding its architecture because (1) you work at a low level, close to the hardware and many of the features you work with are actually hardware implemented (e.g. textures); (2) you want to make the best performances out of your programs (e.g. make best use of blocks and warps and cache ...) ;)