First Question
The code snippet in the docs calls DrawIndexdeInstanced in a for loop
for (UINT i = 0; i < m_cityRowCount; i++) {
for (UINT j = 0; j < m_cityColumnCount; j++) {
pCommandList->DrawIndexedInstanced(numIndices, 1, 0, 0, 0);
}
}
But the API
void DrawIndexedInstanced(
[in] UINT IndexCountPerInstance,
[in] UINT InstanceCount,
[in] UINT StartIndexLocation,
[in] INT BaseVertexLocation,
[in] UINT StartInstanceLocation
);
void DrawInstanced(
[in] UINT VertexCountPerInstance,
[in] UINT InstanceCount,
[in] UINT StartVertexLocation,
[in] UINT StartInstanceLocation
);
has StartInstanceLocation and InstanceCount parameters which I assume affects of offsetting by InstanceIndex*StartInstanceLocation.
So are the following equivalent?
DrawIndexedInstanced(100, 2, 0, 0, 100);
//vs
DrawIndexedInstanced(100, 1, 0, 0, 0);
DrawIndexedInstanced(100, 1, 100, 0, 0);
DrawInstanced(100, 2, 0, 100);
//vs
DrawInstanced(100, 1, 0, 0);
DrawInstanced(100, 1, 100, 0);
Second Question
How does instancing improve performance in the D3D12Bundles sample referred to by the docs? They call SetPipelineState in between each each instance. And the constant buffer used for the g_mWorldViewProj in the vertex shader also changes each instance. How does anything get reused?
for (UINT i = 0; i < m_cityRowCount; i++) {
for (UINT j = 0; j < m_cityColumnCount; j++) {
// Alternate which PSO to use; the pixel shader is different on
// each just as a PSO setting demonstration.
pCommandList->SetPipelineState(usePso1 ? pPso1 : pPso2);
usePso1 = !usePso1;
// Set this city's CBV table and move to the next descriptor.
pCommandList->SetGraphicsRootDescriptorTable(2, cbvSrvHandle);
cbvSrvHandle.Offset(cbvSrvDescriptorSize);
pCommandList->DrawIndexedInstanced(numIndices, 1, 0, 0, 0);
}
}
The canonical sample for instancing is InstancingFX11 (rather than D3D12Bundles linked to by the docks of
DrawIndexedInstanced()which just benefits from the indexing but not the instancing)The writer of the InstantingFX11 sample wrote some comments on how use instancing properly
The key then to performance is that the first vertex buffer (containing the actually vertices) stays the same for each instance. While only the second vertex buffer (containing the different World translation matrices) strides between instances.
Re: first question
Seems like I was completely wrong about the first question. The two are not equivalent. In the
INPUT_ELEMENT_DESCstructure the memberInstanceDataStepRatemeans that the
D3D11_INPUT_PER_VERTEX_DATAvertex buffers will not differ between instances. So sequential instanced draw calls do not combine into one instance, whoops.