I'm trying to run a Jupyter notebook that uses Tensorflow to train a neural network using CUDA. I am on a Windows machine, therefore I needed to install everything on WSL2 because of the latest versions of Tensorflow which don't support native Windows when you need to use the GPU.
I followed all the steps described on the NVIDIA documentation and the Tensorflow documentation. I installed the NVIDIA Driver, the CUDA toolkit, the CUDNN library and so on. I'm using conda with a custom environment containing Tensorflow and the other libraries listed in the guides.
The nvidia-smi finds my 3070 Ti, and also when running simple bash commands like:
python -c "import tensorflow as tf;tf.debugging.set_log_device_placement(True);a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]);b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]]);c = tf.matmul(a, b);print(c)"
I get the right answer and the GPU is actually used.
Anyway, if I try to run the command jupyter notebook, select my notebook and start running the cells, everything goes well (the GPU is again detected correctly) until I reach the model.fit() part. That part crashes the kernel without warnings.
In the terminal, I can only read this line of error:
Could not load library libcudnn_cnn_infer.so.8. Error: libcuda.so: cannot open shared object file: No such file or directory
I also tried another thing. To test the cuDNN library, I followed the official guide. Everything goes well until I try to launch the test. If I launch it as a simple user I get the same error as above; if I launch it as super user, it goes well.
I tried to run the jupyter notebook as super user commenting out the Defaults secure_path part in the /etc/sudoers file so that the PATH variable stays the same (if I understand correctly) but in that case the jupyter notebook doesn't detect other packages and Tensorflow doesn't use the GPU.
Am I missing something?