I am trying to work with sparse matrices in arrayfire but I am getting an access violation somewhere inside array fire dll. I used several examples but always with the same result. I can see the error on CUDA and CPU backend.
Code:
This is the code I am trying to run
af::info();
af_print(af::randu(5, 4));
float v[] = {5, 8, 3, 6};
int r[] = {0, 0, 2, 3, 4};
int c[] = {0, 1, 2, 1};
const int M = 4, N = 4, nnz = 4;
af::array vals = af::array(af::dim4(nnz), v);
af::array row_ptr = af::array(af::dim4(M + 1), r);
af::array col_idx = af::array(af::dim4(nnz), c);
af_print(vals);
af_print(row_ptr);
af_print(col_idx);
// Create sparse array (CSR) from af::arrays containing values,
// row pointers, and column indices.
auto sparseM = af::sparse(M, N, vals, row_ptr, col_idx, AF_STORAGE_CSR);
af_print(sparseM);
auto res = sparseM * sparseM;
af_print(res);
My configuration:
Running on Windows 11. Compiling with Visual Studio Build tool 2022 using cmake in VSCode. (details about the system can be found below in the output from ArrayFire). My system has two GPU, NVIDIA and one from Intel
[platform][1688317712][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\common\DependencyModule.cpp(104) ] Found: forge.dll
[platform][1688317712][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\opencl\device_manager.cpp(217) ] Found 5 OpenCL platforms
[platform][1688317712][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\opencl\device_manager.cpp(229) ] Found 1 devices on platform NVIDIA CUDA
[platform][1688317712][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\opencl\device_manager.cpp(234) ] Found device NVIDIA GeForce GTX 1050 Ti with Max-Q Design on platform NVIDIA CUDA
[platform][1688317712][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\opencl\device_manager.cpp(229) ] Found 1 devices on platform Intel(R) OpenCL HD Graphics
[platform][1688317712][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\opencl\device_manager.cpp(234) ] Found device Intel(R) UHD Graphics 630 on platform Intel(R) OpenCL HD Graphics
[platform][1688317712][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\opencl\device_manager.cpp(229) ] Found 1 devices on platform Intel(R) OpenCL
[platform][1688317712][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\opencl\device_manager.cpp(234) ] Found device Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz on platform Intel(R) OpenCL
[platform][1688317712][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\opencl\device_manager.cpp(229) ] Found 1 devices on platform Intel(R) FPGA Emulation Platform for OpenCL(TM)
[platform][1688317712][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\opencl\device_manager.cpp(234) ] Found device Intel(R) FPGA Emulation Device on platform Intel(R) FPGA Emulation Platform for OpenCL(TM)
[platform][1688317712][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\opencl\device_manager.cpp(229) ] Found 0 devices on platform Intel(R) FPGA SDK for OpenCL(TM)
[platform][1688317712][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\opencl\device_manager.cpp(239) ] Found 4 OpenCL devices
it will fail on the second to the last line auto res = sparseM * sparseM;. This will also fail when multiplying with another non-sparse matrix.
This is the output from above:
[unified][1688317011][10220] [ F:\buildbot\worker\win10-cuda-installer\build\src\api\unified\symbol_manager.cpp(146) ] Found: afcuda.dll
Loaded 'C:\Windows\System32\DriverStore\FileRepository\nvdmi.inf_amd64_893ed8ff453738db\nvcuda64.dll'.
[unified][1688317012][10220] [ F:\buildbot\worker\win10-cuda-installer\build\src\api\unified\symbol_manager.cpp(153) ] Device Count: 1.
[unified][1688317012][10220] [ F:\buildbot\worker\win10-cuda-installer\build\src\api\unified\symbol_manager.cpp(208) ] AF_DEFAULT_BACKEND: cuda
[platform][1688317031][10220] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\common\DependencyModule.cpp(101) ] Attempting to load: forge.dll
[platform][1688317031][10220] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\common\DependencyModule.cpp(104) ] Found: forge.dll
[mem][1688317031][10220] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\common\DefaultMemoryManager.cpp(128) ] memory[0].max_bytes: 14.7 GB
ArrayFire v3.8.3 (CPU, 64-bit Windows, build 987d5675a)
[0] Intel: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz[mem][1688317044][10220] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\cpu\memory.cpp(147) ] nativeAlloc: 1 KB 0x2182dabc380
2
3
4
col_idx
[4 1 1 1]
0
1
2
1
sparseM
Storage Format : AF_STORAGE_CSR
[4 4 1 1]
sparseM: Values
[4 1 1 1]
5.0000
8.0000
3.0000
6.0000
sparseM: RowIdx
[5 1 1 1]
2
0
2
3
4
sparseM: ColIdx
[4 1 1 1]
0
1
2
1
2
Exception thrown at 0x00007FFEF118ADCC (af.dll) in EigenSim.exe: 0xC0000005: Access violation reading location 0xFFFFFFFFFFFFFFFF.
When looking at the debug stack trace I am getting:
af.dll!af_get_last_error(char * * str, __int64 * len) Line 46 (f:\buildbot\worker\win10-cuda-installer\build\src\api\unified\error.cpp:46)
af.dll!af::operator*(const af::array & lhs, const af::array & rhs) Line 939 (f:\buildbot\worker\win10-cuda-installer\build\src\api\cpp\array.cpp:939)
EigenSim.exe!testBackend() Line 27 (c:\Development\Eigensim\src\main\RunTestEigenSim.cpp:27)
EigenSim.exe!main() Line 37 (c:\Development\Eigensim\src\main\RunTestEigenSim.cpp:37)
EigenSim.exe!invoke_main() Line 79
This is the output when using CUDA backend
[unified][1688317714][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\api\unified\symbol_manager.cpp(146) ] Found: afcuda.dll
Loaded 'C:\Windows\System32\DriverStore\FileRepository\nvdmi.inf_amd64_893ed8ff453738db\nvcuda64.dll'.
[unified][1688317714][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\api\unified\symbol_manager.cpp(153) ] Device Count: 1.
[unified][1688317714][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\api\unified\symbol_manager.cpp(208) ] AF_DEFAULT_BACKEND: cuda
[platform][1688317721][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\common\DependencyModule.cpp(101) ] Attempting to load: forge.dll
[platform][1688317721][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\common\DependencyModule.cpp(104) ] Found: forge.dll
[platform][1688317721][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\cuda\device_manager.cpp(494) ] CUDA Driver supports up to CUDA 12.2.0 ArrayFire CUDA Runtime 12.0.0
[platform][1688317721][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\cuda\device_manager.cpp(479) ] CUDA driver version(12.2.0) not part of the CudaToDriverVersion array. Please create an issue or a pull request on the ArrayFire repository to update the CudaToDriverVersion variable with this version of the CUDA runtime.
[platform][1688317721][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\cuda\device_manager.cpp(562) ] Found 1 CUDA devices
[platform][1688317721][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\cuda\device_manager.cpp(590) ] Found device: NVIDIA GeForce GTX 1050 Ti with Max-Q Design (sm_61) (4 GB | ~2076.416015625 GFLOPs | 6 SMs)
[platform][1688317721][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\cuda\device_manager.cpp(625) ] AF_CUDA_DEFAULT_DEVICE:
[platform][1688317722][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\cuda\device_manager.cpp(644) ] Default device: 0(NVIDIA GeForce GTX 1050 Ti with Max-Q Design)
ArrayFire v3.8.3 (CUDA, 64-bit Windows, build 987d5675a)
Platform: CUDA Runtime 12.0, Driver: 12020
[0] NVIDIA GeForce GTX 1050 Ti with Max-Q Design, 4096 MB, CUDA Compute 6.1
[mem][1688317723][16000] [ F:\buildbot\worker\win10-cuda-installer\build\src\backend\common\DefaultMemoryManager.cpp(128) ] memory[0].max_bytes: 3 GB
Things I tried
I tried all backends CUDA, GPU and OpenCL and did not get any good results. Using different storage formats for the sparse matrix will have slightly different effects. Sometimes it will fail when creating the sparse matrix sometimes later when performing operations on them.
From your wording, I assume your intent is to perform matrix-matrix multiplication.
If that's the case, then the code for the operation should be done using the function
af::matmul. For sparse matrices, the operation only works with a sparse matrix as the first argument and a dense matrix on the second argument. You can learn more in the ArrayFire Documentation. This means that what you are trying to achieve would not be currently possible. You have to make the second one a dense matrix; something like this:Your code is giving you an error because it is performing element-wise multiplication, not matrix-matrix multiplication, and element-wise multiplication is not a supported operation for sparse matrices.