How do I copy a single struct into global memory?

Question

How do I copy a single struct into global memory?

545 Views Asked by Niels Slotboom At 12 August 2019 at 10:56

I want to copy a set of initaliation values that every thread uses into __global__ memory. I have summarized them into a single struct. However, there are multiple problems with getting it into __global__ memory. First of all, VS2015 tells me that "dynamic initialization is not supported for a __constant__ variable" for this line: __constant__ initValsStruct d_initVals;

Second of all, it tells me that there is "no suitable conversion function from initValsStruct to const void * in this line: cudaMemcpyToSymbol(d_initVals, &h_initVals, sizeof(initValsStruct));

This might be a quite basic C or CUDA question, but what is the best way to copy a single struct to __global__ memory?

I tried what is down in the code; I found a sample somewhere on the CUDA dev forum, where some __constant__ memory (an int array of 1024 elements) gets initialized in the same way.

typedef struct
{
    unsigned int voxels_x = 0;
    unsigned int voxels_y = 0;
    unsigned int voxels_z;

    //Input and output data amounts
    unsigned int n_lines;
    unsigned int TD_samples;

    //amount of total calculations
    unsigned int n_calc;
} initValsStruct;

initValsStruct h_initVals; //host struct to be copied into __global__ memory
__constant__ initValsStruct d_initVals; //where it has to be copied to

int main(){
    //here I initialize every element of the initValsStruct h_initVals, so it is initialized

    cudaMemcpyToSymbol(d_initVals, &h_initVals, sizeof(initValsStruct));
}

This is how I access it:

typedef struct
{
    int device = 0;
    double  *d_xre, *d_xim, //input device arrays
            *d_yre, *d_yim, //output device arrays
            *h_xre_pl, *h_xim_pl, //page locked input host arrays
            *h_yre_pl, *h_yim_pl; //page locked output host arrays
} IOdataPtr;

__device__ void computation(int currentComputation, IOdataPtr ptr) //actual computation kernel
{
    int index;

    for (int i = 0; i < d_initVals.n_lines * PARAMETERS_PER_LINE; i++) {
        index = currentComputation * d_initVals.n_lines * PARAMETERS_PER_LINE + i;
        ptr.d_yre[index] = ptr.d_xre[index];
        ptr.d_yim[index] = ptr.d_xim[index];
    }
}

I would expect it to be able to compile and run the same way it does when I give the initVals struct as an argument to the kernel

Original Q&A

There are 1 best solutions below

**Michael** · Answer 1 · 2019-08-13T01:21:12.427000

Reading your code, it's unclear to me what you're trying to do. But your question was "I want to copy a set of initalization values that every thread uses into global memory", so I'm going to choose to answer that question in a very direct way.Data is copied from the host to device via the cudaMemcpy functions. A worked-out example is below.

The struct:

typedef struct
{
    unsigned int voxels_x;
    unsigned int voxels_y;
    unsigned int voxels_z;

    // Input and output data amounts
    unsigned int n_lines;
    unsigned int TD_samples;

    // amount of total calculations
    unsigned int n_calc;
} initValsStruct;

Initialize it on the host and copy it to the device with cudaMemcpy:

int main(void) {
    initValsStruct h_params;
    initValsStruct *d_params;
    h_params.n_calc = 10;
    // etc. initialization

    // Copy struct to device
    cudaMemcpy(d_params, &h_params, sizeof(initValsStruct), cudaMemcpyHostToDevice);

    // Struct d_params now has whatever values were in h_params. 
    // Unlike this example, be sure to use proper error-checking 
    // for all CUDA API calls

    // some kernel calls

    // done
    return 0;
}

You could also use cudaMallocManaged, which is convenient and a little cleaner. I highly recommend it.

Your kernel calls should be using a initValsStruct pointer in their function signatures.

__device__ void computation(int currentComputation, initValsStruct *ptr, IOdataPtr *ptr) //actual computation kernel
{
    // do something
}

This puts your structure into global memory, where it's usable by any device function receiving a pointer to it. Your code seems to be trying to use the __constant__ keyword, suggesting that you're attempting to use the device-side constant cache. I recommend trying to use global memory first, to work out how to use the basic features of the CUDA API, and then delve into using the constant cache. Your struct has some default values (e.g. dynamic initialization), which is forbidden; redefine your struct without any dynamic initialization, as I've done above, then initialize the struct on the host first, then use cudaMemcpyToSymbol.

How do I copy a single struct into global memory?

There are 1 best solutions below

Related Questions in STRUCT

Related Questions in CUDA

Related Questions in GPU-CONSTANT-MEMORY

Trending Questions

Popular # Hahtags

Popular Questions

How do I copy a single struct into __global__ memory?

There are 1 best solutions below

Related Questions in STRUCT

Related Questions in CUDA

Related Questions in GPU-CONSTANT-MEMORY

Trending Questions

Popular # Hahtags

Popular Questions

How do I copy a single struct into global memory?