Suppose I have 2 OpenCL-capable devices on my machine (not including CPUs); and suppose that an evil colleague of mine creates a different context for each of them, which I have to work with.
I know I can't share buffers between contexts - not properly and officially, at least. But suppose that I create two OpenCL buffers, one in each context, and pass to each of them the same region of host memory, with the CL_MEM_USE_HOST_PTR flag. e.g.:
enum { size = 1234 };
//...
context_1 = clCreateContext(NULL, 1, &some_device_id, NULL, NULL, NULL);
context_2 = clCreateContext(NULL, 1, &another_device_id, NULL, NULL, NULL);
void* host_mem = malloc(size);
assert(host_mem != NULL);
buff_1 = clCreateBuffer(context_1, CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR, size, host_mem, NULL);
buff_2 = clCreateBuffer(context_2, CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR, size, host_mem, NULL);
I realize that, officially,
The result of OpenCL commands that operate on multiple buffer objects created with the same
host_ptror overlapping host regions is considered to be undefined.
But what will actually happen if I copy to this buffer from one device, and from this buffer to another device? I'm specifically interested in the case of (relatively-recent) AMD and NVIDIA GPUs.
If your OpenCL implementation's vendor guarantees some kind of specific behaviour that goes beyond the standard, then go with that and make sure to follow any instructions about limitations to the letter.
If it doesn't, then you have to assume what the standard says.