How to use ncu command to profile average time/usage/etc for a kernel repeating 10 times?

475 Views Asked by At

For example, I have a test program for 5 kernels:

int main()
{
    for (int i = 0; i < 10; i++){
        kernel_1<<<...>>>(...); // warm up
    }
    for (int i = 0; i < 10; i++){
        kernel_1<<<...>>>(...); // to be measured
    }
    ...
    for (int i = 0; i < 10; i++){
        kernel_5<<<...>>>(...); // warm up
    }
    for (int i = 0; i < 10; i++){
        kernel_5<<<...>>>(...); // to be measured
    }
    return 0;
}

Each kernel will run 20 times, but only the last 10 times need to be measured. And I need the average time/usage/statistics for the 10 times. How to do it gracefully using ncu command line? Should I use cudaProfilerStart() / End() to assist?

I want the result to be written into an Excel file. I am a beginner, thank you for help.

1

There are 1 best solutions below

0
Anis Ladram On

You could achieve this using multiple Nsight Compute runs:

  1. Profile kernel_1 but skip first 10 launches: ncu -s 10 -k kernel_1 ...
  2. Profile kernel_2 but skip first 10 launches: ncu -s 10 -k kernel_2 ...

et cætera. Option -s N allows you to skip the first N kernel launches, while taking in account the kernel name filter passed using option -k <kernel-name>.