Setup Nvidia GPU for Docker

December 11th 2023

Requirements

  1. Operating System
    1. Debian
    2. Ubuntu
  2. Nvidia GPU, 700 series or newer
  3. Docker

Getting Started

Update the OS

sudo apt update
sudo apt full-upgrade

Install Docker

Click here for a guide on installing Docker

Setup for Nvidia Drivers

Install required packages

sudo apt install linux-headers-$(uname -r) make pciutils wget libc-dev libc6-dev gcc g++

Blacklist the Nouveau drivers

echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf

Regenerate the kernel initramfs

sudo update-initramfs -u

Backup GRUB

sudo cp /etc/default/grub /etc/default/grub.bak

Edit GRUB

sudo nano /etc/default/grub

Find the line GRUB_CMDLINE_LINUX and edit to to look like below

GRUB_CMDLINE_LINUX="quiet rd.driver.blacklist=grub.nouveau"

Save and exit with ctrl+x

Regenerate GRUB

sudo grub-mkconfig -o /boot/grub/grub.cfg

Restart to load the new grub.

Download Nvidia Drivers

Double check what GPU you have with lspci

lspci | grep VGA

The output might look like, in this case the GPU is a Nvidia Quadro T400, specifically the TU117GLM

01:00.0 VGA compatible controller: NVIDIA Corporation TU117GLM [Quadro T400 Mobile] (rev a1)

Go to the Nvidia driver site at, https://www.nvidia.com/download/index.aspx.
Use the form to find your GPU, make sure Operating System is set to Linux 64-bit and Download Type is Production Branch.

Nvidia driver search form

Click the Search button. On the next page, take note of the Version and Release Date fields, then click the Download button. You should be on the Download page, right click the Agree & Download button and select Copy Link. The contents of the copied text should like something like below, with the version at the end matching what you noted before.

https://us.download.nvidia.com/XFree86/Linux-x86_64/535.146.02/NVIDIA-Linux-x86_64-535.146.02.run

Download the driver file to your linux machine using wget. If you copy the command below make sure to update the URL to the correct path!

wget https://us.download.nvidia.com/XFree86/Linux-x86_64/535.146.02/NVIDIA-Linux-x86_64-535.146.02.run

The output should look like below

--2023-12-11 23:46:32-- https://us.download.nvidia.com/XFree86/Linux-x86_64/535.146.02/NVIDIA-Linux-x86_64-535.146.02.run
Resolving us.download.nvidia.com (us.download.nvidia.com)... 192.229.211.70, 2606:2800:21f:3aa:dcf:37b:1ed6:1fb
Connecting to us.download.nvidia.com (us.download.nvidia.com)|192.229.211.70|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 341737575 (326M) [application/octet-stream]
Saving to: ‘NVIDIA-Linux-x86_64-535.146.02.run’

NVIDIA-Linux-x86_64-535.146.02.run 100%[============================================================================>] 325.91M 73.9MB/s in 4.3s

2023-12-11 23:46:37 (75.2 MB/s) - ‘NVIDIA-Linux-x86_64-535.146.02.run’ saved [341737575/341737575]

Verify the file downloaded

ls -halt

You should see a file like below

-rw-r--r-- 1 user group 326M Dec 4 02:53 NVIDIA-Linux-x86_64-535.146.02.run

Make it executable, make sure to update the file name in the command below

chmod +x NVIDIA-Linux-x86_64-535.146.02.run

Use ls to verify the changes.

Install Nvidia Driver

Run the installer

sudo ./NVIDIA-Linux-x86_64-535.146.02.run

Follow the prompts, if you don’t have a desktop environment or you’re running a headless system then don’t install any of the nvidia-xconfig or X.org. The 32-bit components are usually not required either.

Verify the driver installed

nvidia-smi

The output should look like, note the Driver Version and CUDA Version, make sure these are appropriate for the service using the GPU.

Tue Dec 12 01:01:41 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.02              Driver Version: 545.29.02    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA T400                    Off | 00000000:01:00.0 Off |                  N/A |
| 53%   48C    P0              N/A /  31W |      0MiB /  2048MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Install Nvidia Container Toolkit

Add the Nvidia Container repos to apt

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
 && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
 sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
 sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

Install the Nvidia Container Toolkit from apt

sudo apt update
sudo apt install nvidia-container-toolkit -y

Configure Docker to use the toolkit

sudo nvidia-ctk runtime configure --runtime=docker

Restart docker

sudo systemctl restart docker

Verify Docker can use the GPU

sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

The result should look like, take note of the Driver Version and CUDA Version, it should match what you saw before.

Tue Dec 12 06:02:37 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.02              Driver Version: 545.29.02    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA T400                    Off | 00000000:01:00.0 Off |                  N/A |
| 53%   47C    P0              N/A /  31W |      0MiB /  2048MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Docker Compose

The same command can be used with a Docker Compose file like below, make sure to update the timezone

version: '3'
services:
  ubuntu:
    environment: 
      - TZ=America/New_York 
      - NVIDIA_DRIVER_CAPABILITIES=compute,video,utility 
      - NVIDIA_VISIBLE_DEVICES=all
      runtime: nvidia
      deploy:
        resources:
          reservations:
            devices: 
              - driver: nvidia
                count: all
                capabilities: - gpu
      image: ubuntu
      command: nvidia-smi

A minimal Docker Compose is also possible

version: '3'
services:
  ubuntu:
    environment: 
      - TZ=America/New_York 
      - NVIDIA_DRIVER_CAPABILITIES=compute,video,utility 
      - NVIDIA_VISIBLE_DEVICES=all
    runtime: nvidia
    image: ubuntu
    command: nvidia-smi

Put the contents of either code block into a docker-compose.yaml file and run it with the command below

docker compose up 

Watch the log output for any errors.

This post is written by Gouthaman Raveendran, licensed under CC BY-NC 4.0.