I have two 3D tensors, tensor A which has shape [B,N,S] and tensor B which also has shape [B,N,S]. What I want to get is a third tensor C, which I expect to have [B,B,N] shape, where the element C[i,j,k] = np.dot(A[i,k,:], B[j,k,:]. I also want to achieve this is a vectorized way.
Some further info: The two tensors A and B have shape [Batch_size, Num_vectors, Vector_size]. The tensor C, is supposed to represent the dot product between each element in the batch from A and each element in the batch from B, between all of the different vectors.
Hope that it is clear enough and looking forward to you answers!
The suggested
einsum, working directly from theexpression:
matmuldoes adoton the last 2 dimensions, and treats the leading one(s) as batch. In your case 'k' is the batch dimension, and 'm' is the one that should obey thelast A and 2nd to the last of Brule. So rewriting theikm,jkm...to fit, and transposingAandBaccordingly:Not much difference in performance. But now use
matmul:and verify that values match (though more often than not, if shapes match, values do to).
I won't try to measure memory usage, but the time improvement suggests it too is better.
In some cases
einsumis optimized to usematmul. Here that doesn't seem to be the case, though we could play with its parameters. I'm a little surprised thematmulis doing so much better.===
I vaguely recall another SO about
matmultaking a short cut when the two arrays are the same thing,A@A. I usedB=Ain these tests.But that only made a modest difference.
My BLAS etc is standard Linux, nothing special.