Monitor active warps and threads during a divergent CUDA run

442 Views Asked by Silicomancer At 26 January 2022 at 16:40

I implemented some CUDA code. It runs fine but the alogrithm inherently produces a strong thread divergence. This is expected.

I will later try to reduce divergence. But for the moment I would be happy to be able to measure it.

Is there an easy way (prefereably using a runtime API call or a CLI tool) to check how many of my initially scheduled warps and/or threads are still active?

Original Q&A

There are 2 best solutions below

Silicomancer On 29 March 2022 at 22:02 BEST ANSWER

I found a solution that gives me pretty nice results. Calling the following function from some lines of a kernel (and adapted using a proper filter condition) prints the number of active threads of the current warp:

__device__ void printConvergentThreadCount(int line) // Pass __LINE__
{
   const int count = __popc(__activemask());
   const int threadId = blockIdx.x * blockDim.x + threadIdx.x;
   if (threadId == 0) // Filter
   {
      printf("Line %i: %i\n", line, count);
   }
}

Still this doesn't give numbers as long as kernels are running.

AmirSojoodi On 29 January 2022 at 19:14

Besides the solutions given in the comments, you can use Nsight Compute to profile your kernels. You can try its CLI and then see the results in its GUI, e.g.:

ncu --export output --force-overwrite --target-processes application-only \
  --replay-mode kernel --kernel-regex-base function --launch-skip-before-match 0 \
  --section InstructionStats \
  --section Occupancy \
  --section SchedulerStats \
  --section SourceCounters \
  --section WarpStateStats \
  --sampling-interval auto \
  --sampling-max-passes 5 \
  --profile-from-start 1 --cache-control all --clock-control base \
  --apply-rules yes --import-source no --check-exit-code yes \
  your-appication [arguments]

Then, in its GUI you can see some useful information. For example, in the section source counters you can see something like this:

Monitor active warps and threads during a divergent CUDA run

There are 2 best solutions below

Related Questions in C++

Related Questions in CUDA

Related Questions in GPGPU

Related Questions in GPU-WARP

Related Questions in THREAD-DIVERGENCE

Trending Questions

Popular # Hahtags

Popular Questions