These are the notes of all things I did to set up a KVM server with GPU passthrough in OpenSUSE Tumbleweed, on a Ryzen 5 1600x on a Asrock x470 Master SLI motherboard.
EDIT: if you check a few posts below I also list some information about adapting what I did to work on a Threadripper system
Since I took quite a bit of information from the guides posted in this forum, I’d figure I would give back by writing down all I did in here.
It’s in the form of a tutorial, because why not. Most of what I write here is applicable also to Fedora and similar.
START
On Ryzen motherboards, IOMMU/passthrough is broken between AGESA 0.0.7.2 and AGESA 1.0.0.4 patch B (also called ComboPI) so update UEFI if needed. Latest firmware for my motherboard had the fixed AGESA, and I had to update it or I would only get errors when enabling passthrough.
Then go in the UEFI setup, enable IOMMU (set it to “enabled”, not on “Auto”, that’s usually a worse IOMMU grouping to workaround Windows issues), and PCIe ACS (I find “ACS Enable” under Advanced\AMD CBS\NBIO Common Options), and set it to “Enable”.
and SR-IOV too if you want, and adding "amd_iommu=force_isolation iommu=pt
" to kernel command line (and reboot).
This is usually done by editing GRUB or bootloader settings, on OpenSUSE I used Yast’s “booloader” menu to set this.
All the commands I do are executed as root user, or with sudo.
create this script to check what IOMMU groups each device is in
!/bin/bash
shopt -s nullglob
for iommu_group in $(find /sys/kernel/iommu_groups/ -maxdepth 1 -mindepth 1 -type d);do
echo "IOMMU group $(basename "$iommu_group")"
for device in $(\ls -1 "$iommu_group"/devices/); do
if [[ -e "$iommu_group"/devices/"$device"/reset ]]; then
echo -n "[RESET]"
fi
echo -n $'\t'
lspci -nns "$device"
done
done
It will dump a list of all onboard hardware, grouped by IOMMU group, and with [RESET] if the device supports resetting (needed for passthrough).
It won’t order them by IOMMU group thoug so you will have to scroll around to find the devices you are looking for.
we must assign all hardware that must be passed through to vfio-pci.
This can be done by device model or whatever with kernel command line and in modprobe options, but that is not as flexible as manually assigning the PCIe address and having a script force load of vfio-pci as “driver_override” (feature added in kernel 3.16 so should be available everywhere now)
create a file:
nano /etc/modprobe.d/gpu-passthrough.conf
write inside the file:
install vfio-pci /sbin/vfio-pci-override.sh
this means that it will run the script when asked to install the vfio-pci module
Then create the script with
nano /sbin/vfio-pci-override.sh
#!/bin/sh
DEVS="0000:0a:00.0 0000:0a:00.1 0000:08:00.0 0000:09:00.0"
for DEV in $DEVS; do
echo "vfio-pci" > /sys/bus/pci/devices/$DEV/driver_override
done
modprobe -i vfio-pci
In the DEVS variable at the top place all PCI addresses (lspci) of stuff you want to passthrough. I have four entries in this example, first two are for a GPU, the other two are for a couple USB 3.0 cards I want to pass through as well (I have plenty of native USB 3.0 for the host already, I prefer to have these “native” in the VM instead of doing USB passthroughs each time I need to connect a USB device).
Note: most modern GPUs have 2 entries, one for the GPU and one for the “audio device” they use to send audio stream over HDMI/Displayport, both should be added
Make it executable chmod +x /sbin/vfio-pci-override.sh
now we need to tell dracut to load all these files and add vfio-pci in the initramfs, create a file with
nano /etc/dracut.conf.d/gpu-passthrough.conf
inside write
force_drivers+=" vfio vfio-pci vfio_iommu_type1 "
install_items=" /sbin/vfio-pci-override.sh "
for some reason, “add_drivers” as suggested in other guides does not do the job.
note the spaces before and after the ", that’s important for not garbling up with other files that add drivers to the list
rebuild initramfs only for current kernel (so we have a fallback in case what we did breaks things).
Depends from distro, on OpenSUSE it’s
mkinitrd -k $(uname -r)
check that all we need is loaded in the initramfs (and this is how I found out that “add_drivers” didn’t do anything.
lsinitrd | grep vfio
drwxr-xr-x 1 root root 0 Apr 28 04:34 lib/modules/5.6.4-1-default/kernel/drivers/vfio
drwxr-xr-x 1 root root 0 Apr 28 04:34 lib/modules/5.6.4-1-default/kernel/drivers/vfio/pci
-rw-r--r-- 1 root root 25824 Apr 18 03:27 lib/modules/5.6.4-1-default/kernel/drivers/vfio/pci/vfio-pci.ko.xz
-rw-r--r-- 1 root root 13480 Apr 18 03:27 lib/modules/5.6.4-1-default/kernel/drivers/vfio/vfio_iommu_type1.ko.xz
-rw-r--r-- 1 root root 12908 Apr 18 03:27 lib/modules/5.6.4-1-default/kernel/drivers/vfio/vfio.ko.xz
-rw-r--r-- 1 root root 3344 Apr 18 03:27 lib/modules/5.6.4-1-default/kernel/drivers/vfio/vfio_virqfd.ko.xz
-rwxr-xr-x 1 root root 157 Apr 28 03:38 sbin/vfio-pci-override.sh
now add the following to the kernel commandline (usually done by editing GRUB or bootloader settings), on OpenSUSE I used Yast’s “booloader” menu to set this.
amd_iommu=force_isolation iommu=pt rd.driver.pre=vfio-pci
the first two are needed to enable IOMMU with passthrough on the Linux side, and the last is to load the vfio-pci driver before anything else so the script we added will be triggered.
Reboot the system and check that the card is using the vfio-pci driver
lspci -k
0a:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Oland GL [FirePro W2100]
Subsystem: Dell Device 2120
Kernel driver in use: vfio-pci
Kernel modules: radeon, amdgpu
0a:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Oland/Hainan/Cape Verde/Pitcairn HDMI Audio [Radeon HD 7000 Series]
Subsystem: Dell Device aab0
Kernel driver in use: vfio-pci
Kernel modules: snd_hda_intel
IOMMU groups.
Unless you are doing this on server-grade equipment, or you have a Threadripper or other CPU where all PCIe on the board come from its controllers, IOMMU groups will have more than one device in them.
Most usually it’s the integrated stuff (connected or provided by the chipset), and the PCIe slots provided by the chipset.
If you do the above to pass them through, when you go and start the VM you get errors like
Error starting domain: internal error: qemu unexpectedly closed the monitor: 2020-04-28T15:36:59.742769Z qemu-system-x86_64: -device vfio-pci,host=0000:08:00.0,id=hostdev2,bus=pci.9,addr=0x0: vfio 0000:08:00.0: group 13 is not viable
Please ensure all devices within the iommu_group are bound to their vfio bus driver.
it means that the PCIe port places this device in a IOMMU group with other stuff and you must bind all things in that group to the vfio driver.
In my case, this is a USB 3.0 card and it’s been placed in IOMMU group 13, together with other stuff connected/provided by the chipset like onboard SATA and USB controllers, plus other cards that I’m not passing through like the Broadcom ethernet and the R5 230 that is the KVM host GPU.
as shown by the IOMMU groups dump script of above (only Group 13 shown)
IOMMU group 13
[RESET] 01:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:43d0] (rev 01)
01:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset SATA Controller [1022:43c8] (rev 01)
01:00.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Bridge [1022:43c6] (rev 01)
[RESET] 02:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
02:01.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
02:02.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
02:03.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
02:04.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
02:06.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
02:07.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
[RESET] 04:00.0 Ethernet controller [0200]: Intel Corporation I211 Gigabit Network Connection [8086:1539] (rev 03)
[RESET] 05:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Caicos PRO [Radeon HD 7450] [1002:677b]
[RESET] 05:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Caicos HDMI Audio [Radeon HD 6450 / 7450/8450/8490 OEM / R5 230/235/235X OEM] [1002:a...
[RESET] 06:00.0 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme II BCM5709 Gigabit Ethernet [14e4:1639] (rev 20)
06:00.1 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme II BCM5709 Gigabit Ethernet [14e4:1639] (rev 20)
[RESET] 08:00.0 USB controller [0c03]: Renesas Technology Corp. uPD720201 USB 3.0 Host Controller [1912:0014] (rev 03)
[RESET] 09:00.0 USB controller [0c03]: Renesas Technology Corp. uPD720201 USB 3.0 Host Controller [1912:0014] (rev 03)
-
The first (and best) way to deal with this is to move the cards to another PCIe slot served by the CPU. So either of the x16 ones for the GPU or the primary M.2 slot. In general all slots that support Gen3 PCIe must come from the CPU, because the x470 chipset does NOT provide Gen3 PCIe lanes.
If you are booting with CSM disabled, for Ryzen systems, the default boot GPU is the chipset one, probably because it has lover PCIe address. This is perfect for passthrough GPU configuration, (my “boot GPU” is an Asus R5 230, that seems to be the only model of R5 230 that has a EFI/GOP capability, so it can actually boot with CSM disabled).
This is kind of annoying on this board, but probably the only “secure” choice. I have the GPU I want to pass through in one of the main x16 slots, and the other x16 slot is occupied by a SAS HBA card that runs the SAS drives in the VM storage array. I also have a M.2 slot and also a dumb adapter to PCIe x4 slot. -
Another way to deal with this (as also suggested by the error message) is to place all devices in this group with vfio-pci driver. In my case it’s a bit annoying as more or less all x1 PCIe and chipset Sata and USB are in this group, and if I do that I have to move the root filesystem to a NVME drive or to a drive connected to the HBA card and then when a VM is using even just ONE of these devices all the others in the same IOMMU group are blocked off for other VMs.
-
The third option is ACS override, that is basically “let’s assume ACS exists so it’s safe to split the devices in more IOMMU groups” and imho is kind of bad. It may work fine either because the PCIe bridge genuinely does not support peer-to-peer communication and does not report it, or rely on the VMs not abusing the capability, but I’m using VMs for isolation and I don’t like that.
For a better explanation of these options, see this youtube video “A little about Passthrough, PCIe, IOMMU Groups and breaking them up” by Spaceinvader One on Youtube yes he is using Unraid, which I don’t personally like much, but he details the kernel command line to use for each ACS override possibility.
So I chose 1, and moved the USB PCIe card to the M.2 slot from the CPU, added its PCIe ID to the vfio script list so it’s locked out and not used by Linux. Passedthrough and solved firmware issues with it, and boom all is fine.