Hi there, I’m trying to passthrough my GPU to an LXC Container with the help of this guide (h++ps://forums.plex.tv/t/pms-installation-guide-when-using-a-proxmox-5-1-lxc-container/219728)However I seem to fail and don’t understand LXC enough to understand why…
I have multiple GPUs in my System and I don’t know how to differentiate between each of them when finding them using ls -l /dev/dri. So I just tried to pass all of them and would then have tried removing them using trial and error until I only had the one I wanted in my container.
root@pve:~# ls -l /dev/dri
total 0
drwxr-xr-x 2 root root 160 Mar 4 13:05 by-path
crw-rw---- 1 root video 226, 0 Mar 4 13:05 card0
crw-rw---- 1 root video 226, 1 Mar 4 13:05 card1
crw-rw---- 1 root video 226, 2 Mar 4 13:05 card2
crw-rw---- 1 root render 226, 128 Mar 4 13:05 renderD128
crw-rw---- 1 root render 226, 129 Mar 4 13:05 renderD129
crw-rw---- 1 root render 226, 130 Mar 4 13:05 renderD130
The GPU I actually want to pass through is an Radeon RX 480. (The GT 210 is only there because it is the cheapest way to play around with CUDA. I would like to pass that through to another container later on, but that’s a different story.)
root@pve:~# lspci | grep VGA
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480] (rev c7)
05:00.0 VGA compatible controller: NVIDIA Corporation GT218 [GeForce 210] (rev a2)
07:00.0 VGA compatible controller: NVIDIA Corporation GT218 [GeForce 210] (rev a2)
With the previous result in mind I added all cards with the corresponding IDs (What are these numbers called[226:0/226:1 etc]?) to my /etc/pve/lxc/400.conf like this:
lxc.cgroup.devices.allow: c 226:0 rwm
lxc.cgroup.devices.allow: c 226:1 rwm
lxc.cgroup.devices.allow: c 226:2 rwm
lxc.cgroup.devices.allow: c 226:128 rwm
lxc.cgroup.devices.allow: c 226:129 rwm
lxc.cgroup.devices.allow: c 226:130 rwm
lxc.autodev: 1
lxc.hook.autodev: /var/lib/lxc/400/mount_hook.sh
As described in the guide I also added /var/lib/lxc/400/mount_hook.sh as follows
/var/lib/lxc/400/mount_hook.sh
mkdir -p ${LXC_ROOTFS_MOUNT}/dev/dri
mknod -m 666 ${LXC_ROOTFS_MOUNT}/dev/dri/card0 c 226 0
mknod -m 666 ${LXC_ROOTFS_MOUNT}/dev/dri/card0 c 226 1
mknod -m 666 ${LXC_ROOTFS_MOUNT}/dev/dri/card0 c 226 2
mknod -m 666 ${LXC_ROOTFS_MOUNT}/dev/dri/renderD128 c 226 128
mknod -m 666 ${LXC_ROOTFS_MOUNT}/dev/dri/renderD128 c 226 129
mknod -m 666 ${LXC_ROOTFS_MOUNT}/dev/dri/renderD128 c 226 130
However when I start the LXC container in Foregroundwith “lxc-start 400 -F” I get the following output with a very weird formatting:
root@pve:~# lxc-start 400 -F
lxc-start: 400: cgroups/cgfsng.c: mkdir_eexist_on_last: 1287 File exists - Failed to create directory "/sys/fs/cgroup/unified//lxc/400"
lxc-start: 400: cgroups/cgfsng.c: container_create_path_for_hierarchy: 1336 Failed to create cgroup "/sys/fs/cgroup/unified//lxc/400"
lxc-start: 400: cgroups/cgfsng.c: cgfsng_payload_create: 1496 Failed to create cgroup "/sys/fs/cgroup/unified//lxc/400"
lxc-start: 400: conf.c: run_buffer: 352 Script exited with status 1
lxc-start: 400: conf.c: lxc_setup: 3663 Failed to run autodev hooks
lxc-start: 400: start.c: do_start: 1338 Failed to setup container "400"
lxc-start: 400: sync.c: __sync_wait: 62 An error occurred in another process (expected sequence number 5)
lxc-start: 400: start.c: lxc_abort: 1133 Function not implemented - Failed to send SIGKILL to 16023
lxc-start: 400: start.c: __lxc_start: 2080 Failed to spawn container "400"
lxc-start: 400: tools/lxc_start.c: main: 329 The container failed to start
lxc-start: 400: tools/lxc_start.c: main: 335 Additional information can be obtained by setting the --logfile and --logpriority options
When removing “lxc.hook.autodev: /var/lib/lxc/400/mount_hook.sh” this issue does not happen however if I’m correct this script is needed to actually be able to use the GPu in the container, is that correct?
I’m very new to passing GPUs to containers and also VMs and any help would be very appreciated!!
Copied from my Thread on Reddit as I did not have any additional information…