I have developed a Python application that utilizes GPU for computation, and I've created a Docker image to encapsulate this application. Locally, when I run the Docker image with Nvidia GPU support using the command:
docker run --rm --runtime=nvidia --gpus all image-name:tag
the application correctly utilizes the GPU for computation. Additionally, when I run the Python script directly without Docker like python test.py, it also runs on the GPU.
Ultralytics YOLOV8.1.28
Python-3.10.14 torch-2.2.1+cu121 CUDA:0 (NVIDIA GeForce RTX 4060 Laptop GPU, 7908MiB) Python-3.10.14 torch-2.2.1+cu121 CUDA:0 (NVIDIA GeForce RTX 4060 Laptop GPU, 7908MiB) Python-3.10.14 torch-2.2.1+cu121 CUDA:0 (NVIDIA GeForce RTX 4060 Laptop GPU, 7908MiB)
Model summary (fused): 168 layers, 3005843 parameters, 0 gradients, 8.1 GFLOPs Model summary (fused): 168 layers, 3005843 parameters, 0 gradients, 8.1 GFLOPs Model summary (fused): 168 layers, 3005843 parameters, 0 gradients, 8.1 GFLOPs
To achieve the above result withnin the docker, I had to install nvidia-container-toolkit on my ubuntu machine.
But, the GPU is not being consumed in two cases: 1: When I run the docker without the above gpu parameters using the simple command below:
docker run image-name:tag
2: when I deploy the Docker image to an AWS EC2 instance (specifically, a g4dn.xlarge instance with Nvidia T4 GPU), where I actually need it.
The results it shows:
Ultralytics YOLOV8.1.28
Python-3.10.14 torch-2.2.1+cu121 CPU (13th Gen Intel Core(TM) i7-13700HX) Python-3.10.14 torch-2.2.1+cu121 CPU (13th Gen Intel Core(TM) i7-13700HX) Python-3.10.14 torch-2.2.1+cu121 CPU (13th Gen Intel Core(TM) i7-13700HX)
Model summary (fused): 168 layers, 3005843 parameters, 0 gradients, 8.1 GFLOPs Model summary (fused): 168 layers, 3005843 parameters, 0 gradients, 8.1 GFLOPs Model summary (fused): 168 layers, 3005843 parameters, 0 gradients, 8.1 GFLOPS
Ultralytics YOLOV8.1.28 Ultralytics YOLOV8.1.28
Ultralytics YOLOV8.1.28
Python-3.10.14 torch-2.2.1+cu121 CPU (13th Gen Intel Core(TM) i7-13700HX) Python-3.10.14 torch-2.2.1+cu121 CPU (13th Gen Intel Core(TM) i7-13700HX) Python-3.10.14 torch-2.2.1+cu121 CPU (13th Gen Intel Core(TM) i7-13700HX)
Here's my Dockerfile:
FROM python:3.10-slim
# Install system dependencies required for ultralytics and OpenCV
RUN apt-get update && apt-get install -y \
libgl1-mesa-glx \
libglib2.0-0
WORKDIR /app
COPY requirements.txt test.py other_files ./
RUN pip install --no-cache-dir -r requirements.txt
CMD python test.py
On the AWS EC2 instance, I've verified that the GPU is accessible. However, even after deploying the Docker image to the instance and running it, the application continues to run on the CPU.
The AWS EC2 instance I'm using is g4dn.xlarge, which has the configuration:
4 vcpu 16 gb ram 16 gb graphics nvidia t4 tensor core
Here is the full description of the aws service used
Can someone please help me understand why the Docker image is not utilizing the GPU on the AWS EC2 instance and how I can resolve this issue? Thank you.
I have tried to change the docker file to below:
FROM python:3.10-slim
RUN apt-get update && apt-get install -y \
curl \
gnupg \
lsb-release \
wget \
libgl1-mesa-glx \
libglib2.0-0 \
software-properties-common
RUN curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
RUN echo "deb http://ppa.launchpad.net/graphics-drivers/ppa/ubuntu focal main" > /etc/apt/sources.list.d/graphics-drivers.list \
&& apt-key adv --keyserver keyserver.ubuntu.com --recv-keys FCAE110B1118213C
RUN apt-get update
RUN apt-get install -y nvidia-container-toolkit
RUN nvidia-ctk runtime configure --runtime=docker
WORKDIR /app
COPY requirements.txt test.py other_files ./
RUN pip install --no-cache-dir -r requirements.txt
CMD ["python", "test.py"]
But no luck so far.