GCP VM not installing nVidia driver properly

7.6k Views Asked by At

I have created the VM using GCP Console in browser.

While creating VM, I selected the VM Image as "c2-deeplearning-pytorch-1-8-cu110-v20210619-debian-10". Also, I selected GPU as T4.

VM gets created and started and it shows green icon in browser.

Then I try to connect from "gcloud compute ssh " and it asks if I want to install nVidia Driver and I do Y, then it gives error for lock file and driver is not installed as:

This VM requires Nvidia drivers to function correctly. Installation takes ~1 minute. Would you like to install the Nvidia driver? [y/n] y Installing Nvidia driver. install linux headers: linux-headers-4.19.0-16-cloud-amd64 E: dpkg was interrupted, you must manually run 'sudo dpkg --configure -a' to correct the problem. Nvidia driver installed.

I try to verify if driver is installed by running python code as:

import torch torch.cuda.is_available() #returns False.

Anybody else faced this issue?

3

There are 3 best solutions below

2
sandeepsign On

Solution to my problem was:

  • Run manually : sudo dpkg --configure -a
  • Disconnect from machine.
  • Connect again using SSH. Select Y again when asked to install nVidia Driver.

It works then.

3
razimbres On

This is the correct way to install NVIDIA driver on a GCP instance:

cd /

sudo apt purge nvidia-*

Reboot

cd /

sudo wget https://developer.download.nvidia.com/compute/cuda/11.2.2/local_installers/cuda_11.2.2_460.32.03_linux.run
sudo sh cuda_11.2.2_460.32.03_linux.run

Adjust your config accordingly as it pops options in the terminal

Reboot

0
Ali On

Make sure you are running as root. I know this sounds silly, but if you use their notebook instances the default user is not root and if you try to ssh into the instance and run something like gpustat etc or run custom code, you might get errors like NVIDIA drivers are not loaded or such.

If you make sure your user (which is called jupyter in the default case) is in the sudoers then all will work fine.

It is often very complicated to install or reinstall GPU drivers on GCP instances. Make sure you actually need to reinstall before you attempt other solutions.