Proxmox LXC, docker GPU

Hi Guys,

Recently I found an old GTX 750TI laying around and I thought I could use this inside my docker setup for frigate NVR which is running in docker.

I am running proxmox 7 with an ubuntu 20.10 LXC which runs docker. I need to passthrough my 750TI to docker. So far I got the GPU inside the LXC with an nvidia-smi output but can’t run a gpu container.

nvidia-container-cli -k -d /dev/tty info (Inside LXC) output:

-- WARNING, the following logs are for debugging purposes only --

I1125 17:37:13.050875 74779 nvc.c:372] initializing library context (version=1.6.0, build=dd2c49d6699e4d8529fbeaa58ee91554977b652e)
I1125 17:37:13.051118 74779 nvc.c:346] using root /
I1125 17:37:13.051139 74779 nvc.c:347] using ldcache /etc/ld.so.cache
I1125 17:37:13.051155 74779 nvc.c:348] using unprivileged user 65534:65534
I1125 17:37:13.051196 74779 nvc.c:389] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I1125 17:37:13.051515 74779 nvc.c:391] dxcore initialization failed, continuing assuming a non-WSL environment
I1125 17:37:13.054990 74780 nvc.c:274] loading kernel module nvidia
I1125 17:37:13.055505 74780 nvc.c:278] running mknod for /dev/nvidiactl
I1125 17:37:13.055584 74780 nvc.c:282] running mknod for /dev/nvidia0
I1125 17:37:13.055641 74780 nvc.c:286] running mknod for all nvcaps in /dev/nvidia-caps
I1125 17:37:13.070350 74780 nvc.c:214] running mknod for /dev/nvidia-caps/nvidia-cap1 from /proc/driver/nvidia/capabilities/mig/config
I1125 17:37:13.070615 74780 nvc.c:214] running mknod for /dev/nvidia-caps/nvidia-cap2 from /proc/driver/nvidia/capabilities/mig/monitor
I1125 17:37:13.075360 74780 nvc.c:292] loading kernel module nvidia_uvm
I1125 17:37:13.075631 74780 nvc.c:296] running mknod for /dev/nvidia-uvm
I1125 17:37:13.075813 74780 nvc.c:301] loading kernel module nvidia_modeset
I1125 17:37:13.076056 74780 nvc.c:305] running mknod for /dev/nvidia-modeset
I1125 17:37:13.076621 74781 driver.c:101] starting driver service
I1125 17:37:13.594511 74779 nvc_info.c:758] requesting driver information with ''
I1125 17:37:13.597614 74779 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/vdpau/libvdpau_nvidia.so.470.86
I1125 17:37:13.597894 74779 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvoptix.so.470.86
I1125 17:37:13.598052 74779 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.470.86
I1125 17:37:13.598176 74779 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.470.86
I1125 17:37:13.598310 74779 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.470.86
I1125 17:37:13.598527 74779 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.470.86
I1125 17:37:13.598750 74779 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.470.86
I1125 17:37:13.598887 74779 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.470.86
I1125 17:37:13.599010 74779 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.470.86
I1125 17:37:13.599217 74779 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ifr.so.470.86
I1125 17:37:13.599394 74779 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.470.86
I1125 17:37:13.599506 74779 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.470.86
I1125 17:37:13.599678 74779 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.470.86
I1125 17:37:13.599815 74779 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.470.86
I1125 17:37:13.599989 74779 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.470.86
I1125 17:37:13.600153 74779 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.470.86
I1125 17:37:13.600271 74779 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.470.86
I1125 17:37:13.600396 74779 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.470.86
I1125 17:37:13.600589 74779 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cbl.so.470.86
I1125 17:37:13.600765 74779 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.470.86
I1125 17:37:13.600948 74779 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libnvcuvid.so.470.86
I1125 17:37:13.601341 74779 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libcuda.so.470.86
I1125 17:37:13.601616 74779 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.470.86
I1125 17:37:13.601807 74779 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.470.86
I1125 17:37:13.601946 74779 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.470.86
I1125 17:37:13.602073 74779 nvc_info.c:171] selecting /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.470.86
W1125 17:37:13.602139 74779 nvc_info.c:397] missing library libnvidia-nscq.so
W1125 17:37:13.602157 74779 nvc_info.c:397] missing library libnvidia-fatbinaryloader.so
W1125 17:37:13.602170 74779 nvc_info.c:401] missing compat32 library libnvidia-ml.so
W1125 17:37:13.602185 74779 nvc_info.c:401] missing compat32 library libnvidia-cfg.so
W1125 17:37:13.602200 74779 nvc_info.c:401] missing compat32 library libnvidia-nscq.so
W1125 17:37:13.602222 74779 nvc_info.c:401] missing compat32 library libcuda.so
W1125 17:37:13.602237 74779 nvc_info.c:401] missing compat32 library libnvidia-opencl.so
W1125 17:37:13.602254 74779 nvc_info.c:401] missing compat32 library libnvidia-ptxjitcompiler.so
W1125 17:37:13.602272 74779 nvc_info.c:401] missing compat32 library libnvidia-fatbinaryloader.so
W1125 17:37:13.602285 74779 nvc_info.c:401] missing compat32 library libnvidia-allocator.so
W1125 17:37:13.602303 74779 nvc_info.c:401] missing compat32 library libnvidia-compiler.so
W1125 17:37:13.602320 74779 nvc_info.c:401] missing compat32 library libnvidia-ngx.so
W1125 17:37:13.602333 74779 nvc_info.c:401] missing compat32 library libvdpau_nvidia.so
W1125 17:37:13.602348 74779 nvc_info.c:401] missing compat32 library libnvidia-encode.so
W1125 17:37:13.602366 74779 nvc_info.c:401] missing compat32 library libnvidia-opticalflow.so
W1125 17:37:13.602384 74779 nvc_info.c:401] missing compat32 library libnvcuvid.so
W1125 17:37:13.602401 74779 nvc_info.c:401] missing compat32 library libnvidia-eglcore.so
W1125 17:37:13.602421 74779 nvc_info.c:401] missing compat32 library libnvidia-glcore.so
W1125 17:37:13.602443 74779 nvc_info.c:401] missing compat32 library libnvidia-tls.so
W1125 17:37:13.602458 74779 nvc_info.c:401] missing compat32 library libnvidia-glsi.so
W1125 17:37:13.602472 74779 nvc_info.c:401] missing compat32 library libnvidia-fbc.so
W1125 17:37:13.602488 74779 nvc_info.c:401] missing compat32 library libnvidia-ifr.so
W1125 17:37:13.602502 74779 nvc_info.c:401] missing compat32 library libnvidia-rtcore.so
W1125 17:37:13.602517 74779 nvc_info.c:401] missing compat32 library libnvoptix.so
W1125 17:37:13.602533 74779 nvc_info.c:401] missing compat32 library libGLX_nvidia.so
W1125 17:37:13.602550 74779 nvc_info.c:401] missing compat32 library libEGL_nvidia.so
W1125 17:37:13.602565 74779 nvc_info.c:401] missing compat32 library libGLESv2_nvidia.so
W1125 17:37:13.602579 74779 nvc_info.c:401] missing compat32 library libGLESv1_CM_nvidia.so
W1125 17:37:13.602594 74779 nvc_info.c:401] missing compat32 library libnvidia-glvkspirv.so
W1125 17:37:13.602613 74779 nvc_info.c:401] missing compat32 library libnvidia-cbl.so
I1125 17:37:13.603444 74779 nvc_info.c:297] selecting /usr/bin/nvidia-smi
I1125 17:37:13.603509 74779 nvc_info.c:297] selecting /usr/bin/nvidia-debugdump
I1125 17:37:13.603565 74779 nvc_info.c:297] selecting /usr/bin/nvidia-persistenced
I1125 17:37:13.603655 74779 nvc_info.c:297] selecting /usr/bin/nvidia-cuda-mps-control
I1125 17:37:13.603713 74779 nvc_info.c:297] selecting /usr/bin/nvidia-cuda-mps-server
W1125 17:37:13.603930 74779 nvc_info.c:423] missing binary nv-fabricmanager
I1125 17:37:13.604008 74779 nvc_info.c:341] listing firmware path /lib/firmware/nvidia/470.86
I1125 17:37:13.604087 74779 nvc_info.c:520] listing device /dev/nvidiactl
I1125 17:37:13.604107 74779 nvc_info.c:520] listing device /dev/nvidia-uvm
I1125 17:37:13.604122 74779 nvc_info.c:520] listing device /dev/nvidia-uvm-tools
I1125 17:37:13.604144 74779 nvc_info.c:520] listing device /dev/nvidia-modeset
W1125 17:37:13.604233 74779 nvc_info.c:347] missing ipc path /var/run/nvidia-persistenced/socket
W1125 17:37:13.604300 74779 nvc_info.c:347] missing ipc path /var/run/nvidia-fabricmanager/socket
W1125 17:37:13.604351 74779 nvc_info.c:347] missing ipc path /tmp/nvidia-mps
I1125 17:37:13.604373 74779 nvc_info.c:814] requesting device information with ''
I1125 17:37:13.611141 74779 nvc_info.c:705] listing device /dev/nvidia0 (GPU-e9647136-a56a-fc90-2806-54197d6dc27b at 00000000:03:00.0)
NVRM version:   470.86
CUDA version:   11.4

Device Index:   0
Device Minor:   0
Model:          NVIDIA GeForce GTX 750 Ti
Brand:          GeForce
GPU UUID:       GPU-e9647136-a56a-fc90-2806-54197d6dc27b
Bus Location:   00000000:03:00.0
Architecture:   5.0
I1125 17:37:13.611246 74779 nvc.c:423] shutting down library context
I1125 17:37:13.637276 74781 driver.c:163] terminating driver service
I1125 17:37:13.638161 74779 driver.c:203] driver service terminated successfully

Nvidia-smi output inside LXC:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.86       Driver Version: 470.86       CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:03:00.0 Off |                  N/A |
| 34%   31C    P0     1W /  38W |      0MiB /  2002MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

But when I try: docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

I get:

docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: container error: cgroup subsystem devices not found: unknown.

nano /etc/nvidia-container-runtime/config.toml output

disable-require = false
#swarm-resource = "DOCKER_RESOURCE_GPU"
#accept-nvidia-visible-devices-envvar-when-unprivileged = true
#accept-nvidia-visible-devices-as-volume-mounts = false

[nvidia-container-cli]
#root = "/run/nvidia/driver"
#path = "/usr/bin/nvidia-container-cli"
environment = []
#debug = "/var/log/nvidia-container-toolkit.log"
#ldcache = "/etc/ld.so.cache"
load-kmods = true
no-cgroups = false ##also tried true
#user = "root:video"
ldconfig = "@/sbin/ldconfig.real"

[nvidia-container-runtime]
debug = "/tmp/nvidia-container-runtime.log"

Does anyone have a idea to fix this?

Thanks, Dennis

Seems like docker isn’t able to create cgroups, what does your docker config json look like?

Also please post the output of docker info to see it it’s using cgroupsv2

Thanks for your reply,

Output of my lxc config:

lxc.apparmor.profile: unconfined
lxc.cgroup2.devices.allow: a
lxc.cap.drop: 
lxc.cgroup2.devices.allow: c 10:200 rwm
lxc.mount.entry: /dev/net dev/net none bind,create=dir
lxc.cgroup2.devices.allow: c 195:* rwm
lxc.cgroup2.devices.allow: c 507:* rwm
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file

Output from ls /dev/nvidia* -l

---------- 1 root root        0 Nov 26 12:57 /dev/nvidia-modeset
crw-rw-rw- 1 root root 507,   0 Nov 24 18:31 /dev/nvidia-uvm
crw-rw-rw- 1 root root 507,   1 Nov 24 18:31 /dev/nvidia-uvm-tools
crw-rw-rw- 1 root root 195,   0 Nov 24 18:31 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Nov 24 18:31 /dev/nvidiactl

Docker info:

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.6.3-docker)
  scan: Docker Scan (Docker Inc., v0.9.0)

Server:
 Containers: 10
  Running: 8
  Paused: 0
  Stopped: 2
 Images: 20
 Server Version: 20.10.10
 Storage Driver: fuse-overlayfs
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local local-persist
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runtime.v1.linux runc io.containerd.runc.v2
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 5b46e404f6b9f661a205e28d59c982d3634148f8
 runc version: v1.0.2-0-g52b36a2
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: default
  cgroupns
 Kernel Version: 5.13.19-1-pve
 Operating System: Ubuntu 20.04.3 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 6
 Total Memory: 4GiB
 Name: apollo2
 ID: Y3N5:5MK7:WHFF:X7OS:PAIY:4VBP:7V3E:JZ2T:OZEA:4NUP:JTXP:37LJ
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

For some strange reason it stopped throwing errors and accepted the GPU. I could launch a GPU container. But the GPU just won’t accept the work.

1 Like