As a follow up to a previous question here, I am trying to implement the following loop, which is a matrix-vector multiplication where the vector is a column from the matrix Q, based on the loop iterator :
EDIT: Q cannot be populated before hand but is populated with the progression of iterator K.
for (unsigned K=0;K<N;K++){ // Number of iterations loop
//... do some stuff
for (unsigned i=0; i<N; i++){
float sum = 0;
for (unsigned j=0; j<N; j++){
sum += A[j][i]*Q[j][K];
}
v[i] = sum;
}
//... do some stuff
// populate next column of Q
}
Where the dimensions of the arrays are:
A [N x N]
Q [N x (0.5N + 1)]
This arrays have been flattened in order to use them with cublasSgemv(). My question is, is it possible to use cublasSgemv() by telling it where to start accessing d_Q, and what the increment of the elements are (since it is row-major C++):
EDIT: multiplied memoery access increment with sizeof(float). Still doesn't work as far as i can tell.
Niter = 0.5*N + 1;
for (unsigned K=0;K<N;K++){
cublasSgemv(handle, CUBLAS_OP_T, N, N, &alpha, d_A, N, (d_Q + sizeof(float)*K*(Niter)), (Niter), &beta, d_v , 1);
}
I don't think Its possible to index d_Q like that as I am not getting any results
SOLVED: the solution by @RobertCrovella is what I was looking for. Thanks.
It is possible to index through your flattened
Qmatrix the way you propose. Your call to Sgemv should be as follows:The pointer to
Qshould point to the first element of the column in question, and since your matrix is row-major, this is justd_Q + K(using pointer arithmetic, not byte arithmetic).Niteris the stride (in elements, not bytes) between successive elements of the column in question. Note that your code as written would overwrite the results of one matrix-vector multiply with the next, since you are not indexing throughd_vthe output vector. So I added some indexing ond_v.As @JackOLantern points out, it should also be possible to do this in a single step without your loop, by calling Sgemm:
If your code is not working the way you expect, please provide a complete, compilable example.