Converting an RGB LUT to YUV and then use it to colour correct a UYVY frame?

180 Views Asked by At

I have an RGB colour space LUT(33x33x33x3) with me which I convert to YUV(or UYV for simplicity of use) after trilinear interpolation(so I basically have a 256x256x256x3 LUT). Then I have a video stream in UYVY from which I am taking the frames and colour correcting using the LUT one by one. The problem is that I am getting weird colours.

My LUT file has its values in float.

My RGB LUT to UYV LUT function:

// input is a 256x256x256x3 RGB interpolated LUT containing values from 0-255
__global__ void cudargblut2yuv(uint8_t *input, uint8_t *output) {
    int idx = blockIdx.x * blockDim.x + threadIdx.x;
    int idy = blockIdx.y * blockDim.y + threadIdx.y;
    int idz = blockIdx.z * blockDim.z + threadIdx.z;
    if (idx < 256 && idy < 256 && idz < 256) {
        int index = idx * 256 * 256 * 3 + idy * 256 * 3 + idz * 3;
        float red = input[index];
        float green = input[index + 1];
        float blue = input[index + 2];
        float y = 16 + 0.256 * red + 0.504 * green + 0.0979 * blue;
        float u = 128 + 0.439 * red - 0.368 * green - 0.0714 * blue;
        float v = 128 - 0.148 * red  - 0.291 * green + 0.439 * blue;
        
        // clamping 0-255
        if(y > 255) y = 255;
        else if(y < 0) y = 0;
        if(u < 0) u = 0;
        else if(u > 255) u = 255;
        if(v < 0) v = 0;
        else if(v > 255) v = 255;
        
        output[index] = u;
        output[index + 1] = y;
        output[index + 2] = v;
    }
}

My function which applies LUT to a pixel:

__global__
void applyLUTKernel(const uint8_t* input, uint8_t* output, int frameSize, const uint8_t* lut) {
    int index = blockIdx.x * blockDim.x + threadIdx.x;
    int stride = blockDim.x * gridDim.x;
    frameSize >>= 1;
    for(int i = index; i < frameSize; i += stride) {
        // UYV values from UYVY frame
        uint8_t U = input[(i << 2)];
        uint8_t Y1 = input[(i << 2) + 1];
        uint8_t V = input[(i << 2) + 2];
        uint8_t Y2 = input[(i << 2) + 3];

        uint8_t pixel1U = lut[256 * 256 * 3 * U + 256 * 3 * Y1 + 3 * V];
        uint8_t pixel1Y = lut[256 * 256 * 3 * U + 256 * 3 * Y1 + 3 * V + 1];
        uint8_t pixel1V = lut[256 * 256 * 3 * U + 256 * 3 * Y1 + 3 * V + 2];

        uint8_t pixel2U = lut[256 * 256 * 3 * U + 256 * 3 * Y2 + 3 * V];
        uint8_t pixel2Y = lut[256 * 256 * 3 * U + 256 * 3 * Y2 + 3 * V + 1];
        uint8_t pixel2V = lut[256 * 256 * 3 * U + 256 * 3 * Y2 + 3 * V + 2];

        // getting corresponding LUT[U1][Y1][V1] values to put back into the frame
        output[(i << 2)] = (pixel1U + pixel2U) >> 1;
        output[(i << 2) + 1] = pixel1Y; 
        output[(i << 2) + 2] = (pixel1V + pixel2V) >> 1;
        output[(i << 2) + 3] = pixel2Y;

        // normal frame
        // output[(i << 2)] = U;
        // output[(i << 2) + 1] = Y1;
        // output[(i << 2) + 2] = V;
        // output[(i << 2) + 3] = Y2;
    }
}

Could someone please correct me where I might be wrong?

I suspect that I might be converting LUT incorrectly to UYV or I am substituting the resultant LUT values back to the frame incorrectly. I feel my interpolation is accurate since before I was converting the UYVY LUT to RGB and then applying LUT, which worked perfectly as I was expecting.

I am taking the average of U1 and U2 values because logically the neighbouring pixels should not have huge difference in their values so taking their average makes sense.

What my frames look like:

Original frame

LUT corrected incorrect frame

Expected LUT corrected frame

2

There are 2 best solutions below

2
Martin Brown On

One problem that you have is that you are using unsigned byte types for doing arithmetic on the pixel values where an overflow is possible. During addition when you are putting the values back into the output stream the intermediate result of the sum will be computed modulo 256! The following minor changes to your code ought to improve things and prevent loss of significant data due to overflows.

    int32_t pixel1U = lut[256 * 256 * 3 * U + 256 * 3 * Y1 + 3 * V];
    uint8_t pixel1Y = lut[256 * 256 * 3 * U + 256 * 3 * Y1 + 3 * V + 1];
    int32_t pixel1V = lut[256 * 256 * 3 * U + 256 * 3 * Y1 + 3 * V + 2];

    int32_t pixel2U = lut[256 * 256 * 3 * U + 256 * 3 * Y2 + 3 * V];
    uint8_t pixel2Y = lut[256 * 256 * 3 * U + 256 * 3 * Y2 + 3 * V + 1];
    int32_t pixel2V = lut[256 * 256 * 3 * U + 256 * 3 * Y2 + 3 * V + 2];


 // getting corresponding LUT[U1][Y1][V1] values to put back into the frame
    output[(i << 2)]     = (uint8_t) ((pixel1U + pixel2U) >> 1);
    output[(i << 2) + 1] = pixel1Y; 
    output[(i << 2) + 2] = (uint8_t) ((pixel1V + pixel2V) >> 1);
    output[(i << 2) + 3] = pixel2Y;

Note that U and V components will be corrupted if the contributing values 1U + 2U, 1V + 2V sum exceeds 256 (a situation that seems to occur mostly on the green grass).

You will probably find that your code runs faster too if you promote the working variables inside the arithmetic loop into 32 bit integers. The code would be much clearer if you computed the indexing once.

The same stylistic observation also applies to using "i<<2" as an index throughout - you might as well multiply both the stride and framesize by 4.

0
Shikhin Dahikar On

This was not successful but instead of converting the LUT colour space I have just temporarily converted the UYVY pixel to RGB pixels then applied LUT and converted back them to UYVY pixel. This works much better than converting entire frame at once to UYVY or RGB for my use case.