I want to copy a set of initaliation values that every thread uses into __global__ memory. I have summarized them into a single struct. However, there are multiple problems with getting it into __global__ memory. First of all, VS2015 tells me that "dynamic initialization is not supported for a __constant__ variable" for this line:
__constant__ initValsStruct d_initVals;
Second of all, it tells me that there is "no suitable conversion function from initValsStruct to const void * in this line:
cudaMemcpyToSymbol(d_initVals, &h_initVals, sizeof(initValsStruct));
This might be a quite basic C or CUDA question, but what is the best way to copy a single struct to __global__ memory?
I tried what is down in the code; I found a sample somewhere on the CUDA dev forum, where some __constant__ memory (an int array of 1024 elements) gets initialized in the same way.
typedef struct
{
unsigned int voxels_x = 0;
unsigned int voxels_y = 0;
unsigned int voxels_z;
//Input and output data amounts
unsigned int n_lines;
unsigned int TD_samples;
//amount of total calculations
unsigned int n_calc;
} initValsStruct;
initValsStruct h_initVals; //host struct to be copied into __global__ memory
__constant__ initValsStruct d_initVals; //where it has to be copied to
int main(){
//here I initialize every element of the initValsStruct h_initVals, so it is initialized
cudaMemcpyToSymbol(d_initVals, &h_initVals, sizeof(initValsStruct));
}
This is how I access it:
typedef struct
{
int device = 0;
double *d_xre, *d_xim, //input device arrays
*d_yre, *d_yim, //output device arrays
*h_xre_pl, *h_xim_pl, //page locked input host arrays
*h_yre_pl, *h_yim_pl; //page locked output host arrays
} IOdataPtr;
__device__ void computation(int currentComputation, IOdataPtr ptr) //actual computation kernel
{
int index;
for (int i = 0; i < d_initVals.n_lines * PARAMETERS_PER_LINE; i++) {
index = currentComputation * d_initVals.n_lines * PARAMETERS_PER_LINE + i;
ptr.d_yre[index] = ptr.d_xre[index];
ptr.d_yim[index] = ptr.d_xim[index];
}
}
I would expect it to be able to compile and run the same way it does when I give the initVals struct as an argument to the kernel
Reading your code, it's unclear to me what you're trying to do. But your question was "I want to copy a set of initalization values that every thread uses into global memory", so I'm going to choose to answer that question in a very direct way.Data is copied from the host to device via the
cudaMemcpyfunctions. A worked-out example is below.The struct:
Initialize it on the host and copy it to the device with
cudaMemcpy:You could also use
cudaMallocManaged, which is convenient and a little cleaner. I highly recommend it.Your kernel calls should be using a
initValsStructpointer in their function signatures.This puts your structure into global memory, where it's usable by any device function receiving a pointer to it. Your code seems to be trying to use the
__constant__keyword, suggesting that you're attempting to use the device-side constant cache. I recommend trying to use global memory first, to work out how to use the basic features of the CUDA API, and then delve into using the constant cache. Your struct has some default values (e.g. dynamic initialization), which is forbidden; redefine your struct without any dynamic initialization, as I've done above, then initialize the struct on the host first, then usecudaMemcpyToSymbol.