Docker NVIDIA GPU Passthrough intermittently fails after random periods

42 Views Asked by At

I'm currently utilizing GPU passthrough in a Docker container by running the following commands:

sudo docker run -d \
  --name=jellyfin \
  --runtime=nvidia \
  --gpus all \
  -e NVIDIA_DRIVER_CAPABILITIES=all \
  -e NVIDIA_VISIBLE_DEVICES=all \
  jellyfin/jellyfin:latest

Initially, the setup works fine, as shown in this screenshot, with the Docker container able to access the GPU without issues.

GPU Passthrough Working

However, after a random amount of time (ranging from a day to a week), the GPU passthrough randomly fails, resulting in the container being unable to utilize the GPU, as depicted in this screenshot. (Restarting the docker container will then fix it again)

GPU Passthrough not working

If anyone has any ideas or can help me out with this id appreciate it.

Environment Details: OS - Linux Manjaro Docker Version - 25.0.4, build 1a576c5

I also looked in the docker logs, and other than giving errors that it cant access the gpu, there was nothing of note

[AVHWDeviceContext @ 0x55b4273e0440] cu->cuInit(0) failed -> CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
Device creation failed: -542398533.
Failed to set value 'cuda=cu:0' for option 'init_hw_device': Generic error in an external library
Error parsing global options: Generic error in an external library

0

There are 0 best solutions below