In my employment's codebase, I'm trying to debug a "invalid memory access" error from cudaMemcpyAsync.
The function call is
CHECK_CUDA( cudaMemcpyAsync(A, B, sizeof(B), cudaMemcpyDeviceToHost, stream) )
where A and B are both int*, but B is allocated on the device with
cudaMalloc((void**) &B, sizeof(B))
When it says invalid memory access, what is it trying to access that is invalid? How can I find out what is being inapropriately accessed?
The invalid memory access error does not actually refer to the
cudaMemcpyAsyncoperation. So studying that alone will be unlikely to yield anything useful.CUDA uses an asynchronous reporting mechanism to report device code execution errors "at the next opportunity" via the host API. So the error you are seeing could refer to any kernel execution that took place prior to that call.
To help localize the error, you can try specifying launch blocking when you run your code. The usefulness of this will probably depend on exactly how the code is written, and whether any sort of error checking is being done after CUDA kernel launches. If you compile your code with
--lineinfo, or even if you don't, you can get additional localization information about the problem using the method indicated here.The observation in the comment is a good one, and is perhaps an important clue to coding defects. I will note that:
You can take a look at section 12 in this online training series to get a more in-depth treatment of CUDA error reporting, as well as debugging suggestions.