For example, I have a test program for 5 kernels:
int main()
{
for (int i = 0; i < 10; i++){
kernel_1<<<...>>>(...); // warm up
}
for (int i = 0; i < 10; i++){
kernel_1<<<...>>>(...); // to be measured
}
...
for (int i = 0; i < 10; i++){
kernel_5<<<...>>>(...); // warm up
}
for (int i = 0; i < 10; i++){
kernel_5<<<...>>>(...); // to be measured
}
return 0;
}
Each kernel will run 20 times, but only the last 10 times need to be measured. And I need the average time/usage/statistics for the 10 times. How to do it gracefully using ncu command line? Should I use cudaProfilerStart() / End() to assist?
I want the result to be written into an Excel file. I am a beginner, thank you for help.
You could achieve this using multiple Nsight Compute runs:
kernel_1but skip first 10 launches:ncu -s 10 -k kernel_1 ...kernel_2but skip first 10 launches:ncu -s 10 -k kernel_2 ...et cætera. Option
-s Nallows you to skip the firstNkernel launches, while taking in account the kernel name filter passed using option-k <kernel-name>.