We often use Linux as our algorithm server, typically using Ubuntu, CentOS, or Huawei’s Euler system. However, some small workshops directly use Windows systems as algorithm servers. If you encounter a company still deploying servers on Windows, it is advisable to stay away, as they are either inexperienced or have serious issues, which we refer to as “菜坑” (literally “vegetable pit”).Now, I will introduce how to deploy a GPU Docker algorithm environment on Ubuntu. First, you must have an NVIDIA GPU on your computer, preferably at least a 4090. Second, you need to use a Docker environment, for example, if you want to install Dify, Ollama, or vLLM using Docker, or if your own algorithm service requires Docker and GPU for inference.It is important to note that the GPU is hardware; if there is no physical GPU on your local machine, Docker cannot virtualize it. Otherwise, using a virtualized CPU as a GPU will result in poor performance, similar to not being connected to the internet, where Docker cannot help you connect. It should be emphasized that you need to use a physical GPU in the Docker virtual service environment and abstract virtualization, such as partitioning the GPU, using only a few GPU cores, or allocating a certain amount of memory, etc.Now let’s start with the code operations, using a specific GPU in this example.
To use GPU with Docker, you must install the NVIDIA Driver, NVIDIA Container Toolkit, and CUDA Toolkit.
- 1. Install the NVIDIA-535.183.01 driver. Installation tutorial link: NVIDIA Driver Installation
- 2. Install Docker-28.0.4. Installation tutorial link: Install Docker on Ubuntu 22.04
- 3. Install the NVIDIA Container Toolkit, click the text to view.
Environment Operation
1. Choose the container image version.
- • NGC image website, click the text to view.
- • Use
<span>nvidia-smi</span>to check the highest CUDA version supported by your machine to select the appropriate NGC version; otherwise, compatibility issues may arise. - • Try to choose the latest NGC version to minimize error rates.
2. Pull the image
docker pull nvcr.io/nvidia/tensorrt:xx.xx-py3
3. Run the container image
- • If using Docker 19.03 or higher:
docker run --gpus all -it --rm -v local_dir:container_dir nvcr.io/nvidia/tensorrt:xx.xx-py3 /bin/bash
- • If using Docker 19.02 or older:
nvidia-docker run -it --rm -v local_dir:container_dir nvcr.io/nvidia/tensorrt:xx.xx-py3 /bin/bash
Where:
- •
<span>-it</span>indicates interactive mode. - •
<span>--rm</span>will remove the container upon completion. - •
<span>xx.xx</span>indicates the container version, for example, 23.04. - •
<span>-v</span>indicates the mounted directory,<span>local_dir</span>is the directory on the host, and<span>container_dir</span>is the directory inside the container. - •
<span>--device</span>mounts other devices, /dev/video0 is a mounted camera. - • Personal usage command:
<span>sudo docker run --gpus all -it --rm --device=/dev/video0:/dev/video0 -v /home/albert/tensorrt:/tensorrt [Image ID] /bin/bash</span>
Recommended: GPU Error Resolution Tutorial
Configuration of GPU Docker algorithm environment for CentOS 7 system
First, confirm that the host has an Nvidia graphics card; mine is a 4090.
Install the graphics card driver and CUDA; I installed version 12.2.

4. Algorithm Execution
The basic algorithm environment configuration is complete. After configuring the necessary packages according to requirements.txt, you can run Dify, Ollama, or YOLO algorithms normally (for example).
Now let’s take a look at the configuration example using the Ollama Docker GPU environment.

Then let’s look at the actual configuration items.

Reference Links
<span>[1]</span> NVIDIA Container Toolkit Installation: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html<span>[2]</span> NGC Image Website: https://docs.nvidia.cn/deeplearning/tensorrt/container-release-notes/index.html#rel-23-04<span>[3]</span> GPU Error Resolution Tutorial: https://blog.csdn.net/weixin_55035321/article/details/132648178