IOMMU Passthrough: Need some guidance attempting this with a dual GPU card

I’m following this guide:

I’ve run the iommu-test script, and these are the results:

IOMMU Group 0 00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 10 00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 0 [1022:1460]
IOMMU Group 10 00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 1 [1022:1461]
IOMMU Group 10 00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 2 [1022:1462]
IOMMU Group 10 00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 3 [1022:1463]
IOMMU Group 10 00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 4 [1022:1464]
IOMMU Group 10 00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 5 [1022:1465]
IOMMU Group 10 00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric Device 18h Function 6 [1022:1466]
IOMMU Group 10 00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7 [1022:1467]
IOMMU Group 11 01:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961 [144d:a804]
IOMMU Group 12 03:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] USB 3.1 XHCI Controller [1022:43bb] (rev 02)
IOMMU Group 12 03:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b7] (rev 02)
IOMMU Group 12 03:00.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b2] (rev 02)
IOMMU Group 12 04:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b4] (rev 02)
IOMMU Group 12 04:01.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b4] (rev 02)
IOMMU Group 12 04:04.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b4] (rev 02)
IOMMU Group 12 04:05.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b4] (rev 02)
IOMMU Group 12 04:06.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b4] (rev 02)
IOMMU Group 12 04:07.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b4] (rev 02)
IOMMU Group 12 09:00.0 Network controller [0280]: Intel Corporation Device [8086:24fb] (rev 10)
IOMMU Group 12 0a:00.0 Ethernet controller [0200]: Intel Corporation I211 Gigabit Network Connection [8086:1539] (rev 03)
IOMMU Group 13 0b:00.0 PCI bridge [0604]: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch [10b5:8747] (rev ca)
IOMMU Group 14 0c:08.0 PCI bridge [0604]: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch [10b5:8747] (rev ca)
IOMMU Group 15 0c:10.0 PCI bridge [0604]: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch [10b5:8747] (rev ca)
IOMMU Group 16 0d:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Fiji [Radeon R9 FURY / NANO Series] [1002:7300] (rev c9)
IOMMU Group 16 0d:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Fiji HDMI/DP Audio [Radeon R9 Nano / FURY/FURY X] [1002:aae8]
IOMMU Group 17 0e:00.0 Display controller [0380]: Advanced Micro Devices, Inc. [AMD/ATI] Fiji [Radeon R9 FURY / NANO Series] [1002:7300] (rev c9)
IOMMU Group 1 00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:1453]
IOMMU Group 2 00:01.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:1453]
IOMMU Group 3 00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 4 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 5 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:1453]
IOMMU Group 6 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 7 00:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 7 00:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454]
IOMMU Group 7 11:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device [1022:145a]
IOMMU Group 7 11:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Device [1022:1456]
IOMMU Group 7 11:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] USB3 Host Controller [1022:145c]
IOMMU Group 8 00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 8 00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454]
IOMMU Group 8 12:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device [1022:1455]
IOMMU Group 8 12:00.2 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51)
IOMMU Group 8 12:00.3 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Device [1022:1457]
IOMMU Group 9 00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 59)
IOMMU Group 9 00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)

As the title says, however, I am a bit of a special case. I have two GPUs. I just have one Card. That’s what the PLX Technology, Inc. is in the above listing, I believe.

Each of these PLX entries are separate IOMMU groupings.

My understanding is that a PLX chip is used on single card-dual GPUs to connect the two together.

So since this chip has separate IOMMU groupings, I might hope I can separate one GPU and use it for the Windows VM via Passthrough, and keep the other GPU dedicated to my Linux host.

Obviously this is outside what that guide covers, so I need help with this. I also don’t know if there’s something else I should be looking at in those groupings that necessary for this to work. Any advice is appreciated.

Its tricky because same gpu, same pci device IDs, etc. On the arch wiki there is a script that binds vfio-pci to all non-boot VGA adapters. You will have to do it that way. A script will have to run on your initial ram disk that loads this module for all non-boot vga adapters…

google that and let us know if you cant figure it out. that should fix you up though.

3 Likes

So, following the Special Procedures here:
https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF#Using_identical_guest_and_host_GPUs

Script installation

Create /etc/modprobe.d/vfio.conf with the following:

install vfio-pci /sbin/vfio-pci-override.sh

Edit /etc/mkinitcpio.conf:

Remove any video drivers from MODULES, and add vfio-pci, and vfio_iommu_type1

MODULES="ext4 vfat vfio-pci vfio_iommu_type1"
Add "/etc/modprobe.d/vfio.conf" and "/sbin/vfio-pci-override.sh" to FILES:

FILES="/etc/modprobe.d/vfio.conf /sbin/vfio-pci-override.sh"
Regenerate your initramfs, and reboot:

mkinitcpio -p linux

The file /etc/mkinitcpio.conf doesn’t exist.

The command mkinitcpio doesn’t either.

This is on Fedora 26, so I can see it doing things differently; your guide says:

Install Fedora 26 according to one’s liking.

Am I supposed to be doing this on Arch?

Edit: I see now.

I should skip the mkinitpcio things and continue in your guide from:

Edit vfio.conf and add options to specify the vendor and device IDs of one’s graphics card that one wishes to pass through to the virtual machine:

Edit 2: And it’s cloudy again.

[user@host]$ sudo dracut –f –kver `uname –r`
uname: extra operand ‘–r’
Try ‘uname --help’ for more information.
dracut: Cannot find module directory /lib/modules/–kver/
dracut: and --no-kernel was not specified

-r not emdash r --… copy paste is… not good

anyway,

Create a small script, I've named mine /sbin/vfio-pci-override-vga.sh  It contains:

#!/bin/sh

for i in $(find /sys/devices/pci* -name boot_vga); do
        if [ $(cat $i) -eq 0 ]; then
                GPU=$(dirname $i)
                AUDIO=$(echo $GPU | sed -e "s/0$/1/")
                echo "vfio-pci" > $GPU/driver_override
                if [ -d $AUDIO ]; then
                        echo "vfio-pci" > $AUDIO/driver_override
                fi
        fi
done

modprobe -i vfio-pci

this

is for fedora 21
but should be really close to what you are looking for. dracut is the fedora version of mkinitcpio

2 Likes

So following the guides, I got Windows installed first with the virtual devices.

Whenever I have removed the virtual devices and replaced them with physical ones (i.e. the GPU I wish to passthrough and my USB devices), I get this error when trying to start the VM:

Error starting domain: internal error: Failed to load PCI stub module vfio-pci

Checking to see if vfio-pci loaded or not on boot:

$ dmesg | grep -i vfio
[ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.13.5-200.fc26.x86_64 root=UUID=7fd4e7e6-f813-4127-9c84-2e43d79ab84d ro rootflags=subvol=root rhgb quiet iommu=1 amd_iommu=on rd.driver.pre=vfio-pci
[ 0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-4.13.5-200.fc26.x86_64 root=UUID=7fd4e7e6-f813-4127-9c84-2e43d79ab84d ro rootflags=subvol=root rhgb quiet iommu=1 amd_iommu=on rd.driver.pre=vfio-pci
[ 9127.339019] VFIO - User Level meta-driver version: 0.3
$

This is from the Arch Wiki guide page:

Reboot and verify that vfio-pci has loaded properly and that it is now bound to the right devices.

$ dmesg | grep -i vfio 
[    0.329224] VFIO - User Level meta-driver version: 0.3
[    0.341372] vfio_pci: add [10de:13c2[ffff:ffff]] class 0x000000/00000000
[    0.354704] vfio_pci: add [10de:0fbb[ffff:ffff]] class 0x000000/00000000
[    2.061326] vfio-pci 0000:06:00.0: enabling device (0100 -> 0103)

It isn’t necessary for all devices (or even expected device) from vfio.conf to be in dmesg output. Sometimes device doesn’t appear in output at boot but actually is able to be visible and operatable in guest VM.

$ lspci -nnk -d 10de:13c2
06:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 970] [10de:13c2] (rev a1)
	Kernel driver in use: vfio-pci
	Kernel modules: nouveau nvidia
$ lspci -nnk -d 10de:0fbb
06:00.1 Audio device: NVIDIA Corporation GM204 High Definition Audio Controller [10de:0fbb] (rev a1)
	Kernel driver in use: vfio-pci
	Kernel modules: snd_hda_intel

However, I get:

$ sudo lspci | grep vfio
$ 

Nothing, so vfio loaded but didn’t bind to anything.

Listing my PCI devices, I only see one VGA Controller and accompanying Audio device.

I take this to mean that I can’t do this, as the other GPU isn’t listed to be bound to.

It may be that I need AMDGPU Pro drivers to even access the other GPU, but I’d find that odd since I can see the PLX Chip directly.