CUDA shared memory bank conflicts report higher

Question

CUDA shared memory bank conflicts report higher

1k Views Asked by Nisrak At 29 April 2015 at 03:47

I've been working on optimizing some code and ran into an issue with the shared memory bank conflict report from the CUDA Nsight performance analysis. I was able to reduce it to a very simple piece of code that Nsight reports as having a bank conflict, when it doesn't seem there should be one. Below is the kernel:

__global__ void conflict() {
    __shared__ double values[33];
    values[threadIdx.x] = threadIdx.x;
    values[threadIdx.x+1] = threadIdx.x;
}

And the main function to call it:

int main() {
    conflict<<<1,32>>>();
}

Note that I am using a single warp to really reduce this to the bare minimum. When I run the code, Nsight says there is 1 bank conflict, but according to everything I have read, there should not be any. For each access to the shared memory array, each thread is accessing consecutive values, each belonging to separate banks.

Has anyone else experienced issues with the reporting of Nsight or am I just missing something with the functioning of bank conflicts? I would appreciate any feedback!

Btw, I am running the following setup:

Windows 8
GTX 770
Visual Studio Community 2013
CUDA 7
Nsight Visual Studio Edition Version 4.5

Original Q&A

There are 2 best solutions below

Nisrak On 01 May 2015 at 01:35

It turns out my mistake was with the datatype that I was using. I had mistakenly taken for granted that each element would be placed in 1 bank. However, the double datatype is 8 bytes, so it spans 2 shared memory banks. Changing the datatype to float solved this issue and it is correctly displaying 0 bank conflicts. Thanks for the feedback and help.

**Robert Crovella** · Accepted Answer · 2015-05-24T17:07:22.363000

If the intent is to run the posted code as-is, with double data type, and no bank conflicts, I believe it's possible with appropriate use of cudaDeviceSetSharedMemConfig (on cc3.x devices). Here's a test case:

$ cat t750.cu
#include <stdio.h>

typedef double mytype;


template <typename T>
__global__ void conflict() {
    __shared__ T values[33];
    values[threadIdx.x] = threadIdx.x;
    values[threadIdx.x+1] = threadIdx.x;
}

int main(){

#ifdef EBM
  cudaDeviceSetSharedMemConfig(cudaSharedMemBankSizeEightByte);
#endif

  conflict<mytype><<<1,32>>>();
  cudaDeviceSynchronize();
}

$ nvcc -arch=sm_35 -o t750 t750.cu
t750.cu(8): warning: variable "values" was set but never used
          detected during instantiation of "void conflict<T>() [with T=mytype]"
(19): here

$ nvprof --metrics shared_replay_overhead ./t750
==46560== NVPROF is profiling process 46560, command: ./t750
==46560== Profiling application: ./t750
==46560== Profiling result:
==46560== Metric result:
Invocations                               Metric Name                        Metric Description         Min         Max         Avg
Device "Tesla K40c (0)"
 Kernel: void conflict<double>(void)
          1                    shared_replay_overhead             Shared Memory Replay Overhead    0.142857    0.142857    0.142857
$ nvcc -arch=sm_35 -DEBM -o t750 t750.cu
t750.cu(8): warning: variable "values" was set but never used
          detected during instantiation of "void conflict<T>() [with T=mytype]"
(19): here

$ nvprof --metrics shared_replay_overhead ./t750
==46609== NVPROF is profiling process 46609, command: ./t750
==46609== Profiling application: ./t750
==46609== Profiling result:
==46609== Metric result:
Invocations                               Metric Name                        Metric Description         Min         Max         Avg
Device "Tesla K40c (0)"
 Kernel: void conflict<double>(void)
          1                    shared_replay_overhead             Shared Memory Replay Overhead    0.000000    0.000000    0.000000
$

With specification of EightByteMode, the shared memory replay overhead is zero.

CUDA shared memory bank conflicts report higher

There are 2 best solutions below

Related Questions in CUDA

Related Questions in GPU

Related Questions in GPU-SHARED-MEMORY

Related Questions in BANK-CONFLICT

Trending Questions

Popular # Hahtags

Popular Questions