I have to use shared memory that is 64 elements in size, twice the number of banks and threads in a warp. How should I address them to yield a bank-conflict-free access?
Bank-Conflict-Free Access in shared memory
791 Views Asked by Behzad Baghapour At
2
There are 2 best solutions below
3

In case of 32-bit memory access you can use default memory access pattern.
__shared__ int shared[32];
int data = shared[base + stride * tid];
there stride
is odd.
If you have 64-bit access you can use some trick like this:
struct type
{
int x, y, z;
};
__shared__ struct type shared[32];
struct type data = shared[base + tid];
Let's assume you're using compute capability 1.x, so your shared memory has 16 banks, and each thread has to access 2 elements in shared memory.
What you want is for a thread to access the same memory bank for both elements, so if you organize it such that the required elements are 16 away from each other, you should avoid bank conflicts.
I used this pattern for storing complex floats, but I had an array of complex floats, so it looked like
Where the +1 is to avoid serialization in transposed access patterns.