Using Pycuda: LogicError: cuMemAlloc failed: an illegal memory access

103 Views Asked by At

I am creating a python program with Pycuda to implement Sobel's algorithm in parallel with GPU. My code is:

import pycuda.autoinit
import pycuda.driver as cuda
from pycuda.compiler import SourceModule
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt

def loadImage(image_path):
    # Load the image
    image = Image.open(image_path)  # Convert to grayscale
    image_array = np.asarray(image)
    return image_array

def applySobel(image):
    # Convert image to grayscale if needed
    if len(image.shape) == 3:
        image = np.mean(image, axis=2)

    # Create a GPU device array from the image
    image_gpu = cuda.to_device(image.astype(np.float32))

    # Allocate GPU memory for the output
    output_gpu = cuda.mem_alloc_like(image_gpu)

    # Define the Sobel filter kernels
    sobel_x_kernel = np.array([[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]], dtype=np.float32)
    sobel_y_kernel = np.array([[-1, -2, -1], [0, 0, 0], [1, 2, 1]], dtype=np.float32)

    # Compile the CUDA kernel
    mod = SourceModule("""
        __global__ void sobelFilter(const float* input, float* output, int width, int height) {
            int row = blockIdx.y * blockDim.y + threadIdx.y;
            int col = blockIdx.x * blockDim.x + threadIdx.x;

            if (row < height && col < width) {
                float sum_x = 0.0;
                float sum_y = 0.0;

                for (int i = -1; i <= 1; ++i) {
                    for (int j = -1; j <= 1; ++j) {
                        int neighbor_row = row + i;
                        int neighbor_col = col + j;

                        if (neighbor_row >= 0 && neighbor_row < height && neighbor_col >= 0 && neighbor_col < width) {
                            float pixel = input[neighbor_row * width + neighbor_col];
                            sum_x += pixel * sobel_x_kernel[i + 1][j + 1];
                            sum_y += pixel * sobel_y_kernel[i + 1][j + 1];
                        }
                    }
                }

                output[row * width + col] = sqrt(sum_x * sum_x + sum_y * sum_y);
            }
        }
    """)

    # Get the compiled kernel function
    sobel_filter_func = mod.get_function("sobelFilter")

    # Set the block and grid dimensions
    block_dim = (16, 16)
    grid_dim = ((image.shape[1] - 1) // block_dim[0] + 1, (image.shape[0] - 1) // block_dim[1] + 1)

    # Call the CUDA kernel
    sobel_filter_func(image_gpu, output_gpu, np.int32(image.shape[1]), np.int32(image.shape[0]), block=(block_dim[0], block_dim[1], 1), grid=(grid_dim[0], grid_dim[1]))

    # Copy the result back to the CPU
    output = np.empty_like(image)
    cuda.memcpy_dtoh(output, output_gpu)

    return output

    # Copy the processed image back to CPU
    processed_image = np.empty_like(image)
    cuda.memcpy_dtoh(processed_image, image_gpu)

    return processed_image

def showImage(image):
    # Display the image
    plt.imshow(image, cmap="gray")
    plt.axis("off")
    plt.show()

#Test the code
image_path = "original_image.PNG"

#Load the image
image = loadImage(image_path)

#Apply Sobel filtering
processed_image = applySobel(image)

#Show the original and processed images
showImage(image)
showImage(processed_image)

When try to run, give this:

---------------------------------------------------------------------------
LogicError                                Traceback (most recent call last)
<ipython-input-17-404cd1292d54> in <cell line: 92>()
     90 
     91 #Apply Sobel filtering
---> 92 processed_image = applySobel(image)
     93 
     94 #Show the original and processed images

1 frames
/usr/local/lib/python3.10/dist-packages/pycuda/driver.py in to_device(bf_obj)
   1077     else:
   1078         bf = buffer(bf_obj)
-> 1079     result = mem_alloc(len(bf))
   1080     memcpy_htod(result, bf)
   1081     return result

LogicError: cuMemAlloc failed: an illegal memory access was encountered

The same management of memory I used in other algorithm and went well, but this not. Please, can anyone help me to fix problem?

I would like somebody give code for fix my problem or at least an example available to review.

1

There are 1 best solutions below

0
Martin Finke On

The problem is that the object passed to the mem_alloc_like function must support the nbytes attribute. Therefore, you need to use the image object and not the image_gpu. Correct the relevant line to:

output_gpu = cuda.mem_alloc_like(image)

I also noticed that in the CUDA kernel you refer to the fields sobel_x_kernel and sobel_y_kernel, which were not passed to the kernel. That will have to be done. The procedure is similar to when transferring an image.