halide generator has green image when compiled on gpu

88 Views Asked by At

When compiling halide generator for gpu target CUDA I get green image (on cpu image is correct). Here is the algorithm:

        output(c,x,y) = Halide::cast<uint8_t> (input(mux(c, {1,0,2,3,0,2}), x, y));

And the schedule:

       Target target = get_target();
   std::cout << "target is :" << target;
   if( target.has_gpu_feature()) {
    // schedule for gpu
    output.gpu_tile(x,y,xi,yi,32,32)
          .bound_extent(c,6)
          .unroll(c);
   }

I configure the target in cmakelists file:

add_halide_library(yuv422decoder FROM yuv422.generator
               TARGETS x86-64-windows-avx-avx2-cuda-f16c-fma-sse41)

Also I checked that CUDA is properly installed by building CUDA examples and it works properly:

cuda\cuda-samples\bin\win64\Release>histogram.exe    
Initializing 256-bin histogram...
Running 256-bin GPU histogram for 67108864 bytes (16 runs)...

histogram256() time (average) : 0.01611 sec, 4165.7798 MB/sec

histogram256, Throughput = 4165.7798 MB/s, Time = 0.01611 s, Size = 67108864 Bytes, NumDevsUsed = 1, Workgroup = 192

Validating GPU results...
 ...reading back GPU results
 ...histogram256CPU()
 ...comparing the results
 ...256-bin histograms match

Shutting down 256-bin histogram...


Shutting down...

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

[histogram] - Test Summary
Test passed
1

There are 1 best solutions below

0
Ivan Stimac On

Ok so I manage to make it work. In main program I added following line for input buffer

input.set_host_dirty();

and after the call to generator I added following line for output buffer

out.copy_to_host();

For now it is very slow but I guess I need to tune my scheduling