I would like to understand if it is possible to parallelize the following nested loop (using OpenMP in Fortran).
To give some background information, I am doing this to initialize a sparse matrix. It turns out that the matrix-vector multiplication is very fast, thanks to the sparse BLAS library. The bottleneck in the code is however the loops that I do to create the COO representation of the matrix (row index, col index and nonzero values). So the question is whether it is possible to parallelize this (perhaps using a reduction clause):
nnz = N*N
allocate(row_ind(nnz),col_ind(nnz),values(nnz),stat=istat)
if (istat/=0) then
error stop "Allocation of row_ind,col_ind,values failed"
endif
counter = 1
do iz = 1,n_z
do ia = 1,n_a
ix = (iz-1)*n_a+ia
opt_ind = pol_ap_ind(ia,iz)
do izp = 1,n_z
if (Pi(iz,izp)/=0.0d0) then
ixp = (izp-1)*n_a+opt_ind
values(counter) = Pi(iz,izp)
row_ind(counter) = ix
col_ind(counter) = ixp
counter = counter+1
endif
enddo
enddo
enddo
nnz = counter -1
row_ind = row_ind(1:nnz)
col_ind = col_ind(1:nnz)
values = values(1:nnz)