What needs to be mounted by the kubernetes device plugin in order to use GPU inside containers?

396 Views Asked by At

I have built a GPU device plugin for kubernetes. GPU devices are getting allocated by the plugin, but the GPU drivers are not getting detected inside the container. As much I know, I need to mount several directories in order for the container to detect Nvidia drivers.

I am using the nvidia/cuda:12.0.0-devel-ubuntu22.04 docker image due to which cuda is being detected, but for nvidia drivers I am not sure what all directories needs to be mounted by the device plugin. I have tried mounting /usr/local/nvidia, but it gives me CreateContainerError. Any suggestions ?

1

There are 1 best solutions below

1
Richard Rublev On

You should install nvidia-container-toolkit in all nodes.

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sudo tee /etc/apt/sources.list.d/libnvidia-container.list

sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit

Next step edit /etc/docker/daemon.json

{
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

Install plugin

kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/1.0.0-beta4/nvidia-device-plugin.yml

You can read more https://github.com/NVIDIA/k8s-device-plugin