I have 4 GPU devices (11GB memory for each). Now, I want to add two big matrices A and B (6GB for each), so that it is not possible to have two matrices in the memory of one single device. I'd like to know how to perform the addition in this case?
For example, is there a way to perform the addition using multiple GPUs such that one GPU for matrix A, one for matrix B and the third one for the result matrix C?
Your best bet would be to partition the large arrays, similar to what MATLAB's
distributedarrays do. (Unfortunately you can't combinedistributedangpuArray, so you'll have to do things manually). The basic idea is to do something a bit like this:After this, each worker has a portion of the overall
Cstored inmyC.