I'm trying to transfer a buffer containing Array classes to the device, where an Array class is:
struct Array {
float* const ptr;
const size_t length;
Array(float* const ptr, const size_t length) : ptr(ptr), length(length) {}
};
To construct a buffer of arrays in host-code, I am using the placement new operator because the class is not copy-assignable.
Normally I would use cudaMemcpy as follows:
Array* arrays = (Array*) malloc(sizeof(Array) * 3));
new (arrays + 0) (nullptr, 0);
new (arrays + 1) (nullptr, 0);
new (arrays + 2) (nullptr, 0);
Array* device_arrays;
cudaMalloc(&device_arrays, sizeof(Array) * 3);
cudaMemcpy((void*) device_arrays, (void*) arrays, sizeof(Array) * 3, cudaMemcpyHostToDevice);
However, since I am now using const members and a constructor, it occurred to me that while the Array class is trivially copyable, it isn't getting "constructed" by cudaMemcpy. Is it valid to use the device_arrays pointer in a kernel, for example:
__global__ void foo(Array* device_arrays) {
int l = device_arrays[0].length;
}
Or do I need to construct the Array object in device code? (If I need to construct it separately, it would seem like this would only be possibly by transferring the ptr and length data in POD form, and constructing the Array object in a kernel from the POD data. It does not seem like something that can be automated with a templated function).
Everything you have shown so far will work, approximately as you have written it (correcting various typos/omissions).
You are initializing each of the 3 structures/objects in host code, and the
cudaMemcpyoperation copies all of that to device memory. The kernel launch mechanism itself (analogous to standard C++ function call pass-by-value mechanism) makes the pointer (device_arrays) to the array ofArrayusable in device code.However, all you have done is set length to zero and initialized each object in the array with a embedded NULL pointer - not very interesting.
If you decide to do something else with your placement new initialization, you had better make sure that the pointer you pass:
to the constructor is a pointer that is usable in device code (for example a pointer allocated by
cudaMalloc) if you want to dereference that embedded pointer in device code. Here is an example: