Radeon GPU "Passthrough" with LXC on Proxmox. Help needed

Hi there, I’m trying to passthrough my GPU to an LXC Container with the help of this guide (h++ps://forums.plex.tv/t/pms-installation-guide-when-using-a-proxmox-5-1-lxc-container/219728)However I seem to fail and don’t understand LXC enough to understand why…

I have multiple GPUs in my System and I don’t know how to differentiate between each of them when finding them using ls -l /dev/dri. So I just tried to pass all of them and would then have tried removing them using trial and error until I only had the one I wanted in my container.

[email protected]:~# ls -l /dev/dri
total 0 
drwxr-xr-x 2 root root        160 Mar  4 13:05 by-path 
crw-rw---- 1 root video  226,   0 Mar  4 13:05 card0 
crw-rw---- 1 root video  226,   1 Mar  4 13:05 card1 
crw-rw---- 1 root video  226,   2 Mar  4 13:05 card2 
crw-rw---- 1 root render 226, 128 Mar  4 13:05 renderD128 
crw-rw---- 1 root render 226, 129 Mar  4 13:05 renderD129 
crw-rw---- 1 root render 226, 130 Mar  4 13:05 renderD130

The GPU I actually want to pass through is an Radeon RX 480. (The GT 210 is only there because it is the cheapest way to play around with CUDA. I would like to pass that through to another container later on, but that’s a different story.)

[email protected]:~# lspci | grep VGA
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480] (rev c7) 
05:00.0 VGA compatible controller: NVIDIA Corporation GT218 [GeForce 210] (rev a2) 
07:00.0 VGA compatible controller: NVIDIA Corporation GT218 [GeForce 210] (rev a2)

With the previous result in mind I added all cards with the corresponding IDs (What are these numbers called[226:0/226:1 etc]?) to my /etc/pve/lxc/400.conf like this:

lxc.cgroup.devices.allow: c 226:0 rwm 
lxc.cgroup.devices.allow: c 226:1 rwm 
lxc.cgroup.devices.allow: c 226:2 rwm 
lxc.cgroup.devices.allow: c 226:128 rwm 
lxc.cgroup.devices.allow: c 226:129 rwm 
lxc.cgroup.devices.allow: c 226:130 rwm 
lxc.autodev: 1 
lxc.hook.autodev: /var/lib/lxc/400/mount_hook.sh

As described in the guide I also added /var/lib/lxc/400/mount_hook.sh as follows

mkdir -p ${LXC_ROOTFS_MOUNT}/dev/dri 
mknod -m 666 ${LXC_ROOTFS_MOUNT}/dev/dri/card0 c 226 0
mknod -m 666 ${LXC_ROOTFS_MOUNT}/dev/dri/card0 c 226 1 
mknod -m 666 ${LXC_ROOTFS_MOUNT}/dev/dri/card0 c 226 2 
mknod -m 666 ${LXC_ROOTFS_MOUNT}/dev/dri/renderD128 c 226 128  
mknod -m 666 ${LXC_ROOTFS_MOUNT}/dev/dri/renderD128 c 226 129 
mknod -m 666 ${LXC_ROOTFS_MOUNT}/dev/dri/renderD128 c 226 130

However when I start the LXC container in Foregroundwith “lxc-start 400 -F” I get the following output with a very weird formatting:

[email protected]:~# lxc-start 400 -F
lxc-start: 400: cgroups/cgfsng.c: mkdir_eexist_on_last: 1287 File exists - Failed to create directory "/sys/fs/cgroup/unified//lxc/400"
                                                                                                                                       lxc-start: 400: cgroups/cgfsng.c: container_create_path_for_hierarchy: 1336 Failed to create cgroup "/sys/fs/cgroup/unified//lxc/400"
                                                                                                                           lxc-start: 400: cgroups/cgfsng.c: cgfsng_payload_create: 1496 Failed to create cgroup "/sys/fs/cgroup/unified//lxc/400"
                                                                                                 lxc-start: 400: conf.c: run_buffer: 352 Script exited with status 1
                   lxc-start: 400: conf.c: lxc_setup: 3663 Failed to run autodev hooks
                                                                                      lxc-start: 400: start.c: do_start: 1338 Failed to setup container "400"
            lxc-start: 400: sync.c: __sync_wait: 62 An error occurred in another process (expected sequence number 5)
                                                                                                                     lxc-start: 400: start.c: lxc_abort: 1133 Function not implemented - Failed to send SIGKILL to 16023
                                                                       lxc-start: 400: start.c: __lxc_start: 2080 Failed to spawn container "400"
                                                                                                                                                lxc-start: 400: tools/lxc_start.c: main: 329 The container failed to start
lxc-start: 400: tools/lxc_start.c: main: 335 Additional information can be obtained by setting the --logfile and --logpriority options

When removing “lxc.hook.autodev: /var/lib/lxc/400/mount_hook.sh” this issue does not happen however if I’m correct this script is needed to actually be able to use the GPu in the container, is that correct?

I’m very new to passing GPUs to containers and also VMs and any help would be very appreciated!!

Copied from my Thread on Reddit as I did not have any additional information…

1 Like