During the warp occupancy investigation of my gbuffer pass, I found even if I simplify the scene and the shader, the nsight still reports a very low warp occupancy, or even much lower than the original one, basically always < 10 pixel warps and the "Active SM unused warp slots" always around 80%.
The shaders use 30+ registers, nsight shows there are almost no isbe and tram allocation, but it shows there are always 90+% register allocation, consider there are only several pixel warps active, the high register usage is very strange.
So what could be the reason for the above behaviour? how can I improve my warp occupancy? cutting down registers actually made the occupancy worse...