OpenCL nested loop misalignment

239 Views Asked by At

I'm trying to use GPU for some image processing. In my kernel function I catched "misalignment" exception as

The thread tried to read or write data that is misaligned on hardware that does not provide alignment. For example, 16-bit values must be aligned on 2-byte boundaries; 32-bit values on 4-byte boundaries, and so on.

I reduced the kernel code to loops only, but I still got this problem. My reduced kernel function:

__kernel void TestKernel(
    global const uchar* iImage, 
    global uchar* oImage, 
    uint width,
    uint heigth, 
    uchar dif,
    float power)
{
   uint y = get_global_id(0);

    if (y >= heigth) 
        return; 

    for (uint x = 0; x< width; ++x){
        for (uint i = 0; i < 5; ++i) {
            uint sum = 0;
            for (uint j = 0; j<5; ++j) {
                sum += 3;
            }
        }

    }   
}

(program throws exception in the second loop)

I'm using the C++ wrapper to call my kernel

kernel.setArg(iArg++, iImage);
    kernel.setArg(iArg++, oImage);
    kernel.setArg(iArg++, header.GetVal(header.Width));
    kernel.setArg(iArg++, header.GetVal(header.Height));
    kernel.setArg(iArg++, (unsigned char)10);
    kernel.setArg(iArg++, saturation);


    queue.enqueueNDRangeKernel(kernel, cl::NullRange, cl::NDRange(header.GetVal(header.Height)), cl::NDRange(128));

oImage and iImage are cl::Buffer

saturation is float

header.GetVal() returns int

I'm using Visual Studio 2015 with CodeXL plugin and run the program on AMD Spectre(Radion R7).

What can cause this problem?

0

There are 0 best solutions below