VFIO in 2019 -- Pop!_OS How-To (General Guide though) [DRAFT]

I’ve been attempting to set this up for the past couple days following a different guide and now this one using Ubuntu 19.04. Both guides have generated the same list of errors that Google can’t seem to find an answer for:

Error starting domain: internal error: qemu unexpectedly closed the monitor: 2019-07-28T14:54:10.600494Z qemu-system-x86_64: -device vfio-pci,host=0a:00.0,id=hostdev0,bus=pci.5,addr=0x0: vfio 0000:0a:00.0: failed to setup container for group 16: failed to set iommu for container: Operation not permitted

Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 75, in cb_wrapper
    callback(asyncjob, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 111, in tmpcb
    callback(*args, **kwargs)
  File "/usr/share/virt-manager/virtManager/libvirtobject.py", line 66, in newfn
    ret = fn(self, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/domain.py", line 1400, in startup
    self._backend.create()
  File "/usr/lib/python3/dist-packages/libvirt.py", line 1080, in create
    if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self)
libvirt.libvirtError: internal error: qemu unexpectedly closed the monitor: 2019-07-28T14:54:10.600494Z qemu-system-x86_64: -device vfio-pci,host=0a:00.0,id=hostdev0,bus=pci.5,addr=0x0: vfio 0000:0a:00.0: failed to setup container for group 16: failed to set iommu for container: Operation not permitted

I’m guessing it’s some type of permission issue but I have no idea where to begin troubleshooting it. Hardware is 1950X, ASUS PRIME X399-A, and two identical R9 290X’s.

Take what I say with a grain of salt as I’ve not actually succeeded myself, but my 2cp (guess) is that maybe IOMMU isn’t enabled in the bios/kernel options or that your trying to passthru a group that has a device with a driver attached to it?

/shrug. Without any details nor experience with errors you might get I dunno. Just my gut.

My gut says I missed enabling something somewhere too but I enabled SVM in the BIOS. The instructions mention something about enabling a IOMMU function but the only one I found in the BIOS has something to do with allowing the 2nd CPU die to pass-though IOMMU-something, something I can’t remember all of it off the top of my head. Basically it was saying that CPU die 0 can already function with IOMMU but to enable that option will allow CPU die 1 to do the same.

Unfortunately when I enable it Linux refuses to boot so I’m really hoping it isn’t required.

I did add the amd_iommu=on to /etc/default/grub I assume it needed to be appended onto the line that already said GRUB_CMDLINE_LINUX_DEFAULT=“quiet splash”. Unless I was suppose to create a new line.

Then when I was done with everything the device driver showed up as vfio-pci so I assume it’s not a driver issue.

This is why I’m asking for help now. Hoping to find someone who has done this successfully as I’ve exhausted my resources for figuring out these errors on my own.

Well, that’s the first thing I’d look into. If IOMMU is actually enabled and functioning (in the bios). If not then the grub default isn’t going to do much. There is probably a log boot result you can search for or something else you can query to check if IOMMU support is actually working in linux. Actually the way Wendell is talking all you need to do verify IOMMU is enabled in the bios is running his IOMMU group script. Makes sense.

What I meant by driver is not one attached to your GPU but on an unrelated device that happens to be in the same IOMMU group. In other words is the GPU either isolated in it’s own group or with other devices you’ve reserved to passthru to the VM?

Running his script generates this:

IOMMU Group 0 00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 10 00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454]
IOMMU Group 11 00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 59)
IOMMU Group 11 00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
IOMMU Group 12 00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 0 [1022:1460]
IOMMU Group 12 00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 1 [1022:1461]
IOMMU Group 12 00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 2 [1022:1462]
IOMMU Group 12 00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 3 [1022:1463]
IOMMU Group 12 00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 4 [1022:1464]
IOMMU Group 12 00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 5 [1022:1465]
IOMMU Group 12 00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 6 [1022:1466]
IOMMU Group 12 00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7 [1022:1467]
IOMMU Group 13 00:19.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 0 [1022:1460]
IOMMU Group 13 00:19.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 1 [1022:1461]
IOMMU Group 13 00:19.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 2 [1022:1462]
IOMMU Group 13 00:19.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 3 [1022:1463]
IOMMU Group 13 00:19.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 4 [1022:1464]
IOMMU Group 13 00:19.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 5 [1022:1465]
IOMMU Group 13 00:19.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 6 [1022:1466]
IOMMU Group 13 00:19.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7 [1022:1467]
IOMMU Group 14 01:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] X399 Series Chipset USB 3.1 xHCI Controller [1022:43ba] (rev 02)
IOMMU Group 14 01:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] X399 Series Chipset SATA Controller [1022:43b6] (rev 02)
IOMMU Group 14 01:00.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] X399 Series Chipset PCIe Bridge [1022:43b1] (rev 02)
IOMMU Group 14 02:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port [1022:43b4] (rev 02)
IOMMU Group 14 02:01.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port [1022:43b4] (rev 02)
IOMMU Group 14 02:02.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port [1022:43b4] (rev 02)
IOMMU Group 14 02:03.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port [1022:43b4] (rev 02)
IOMMU Group 14 02:04.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port [1022:43b4] (rev 02)
IOMMU Group 14 02:09.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port [1022:43b4] (rev 02)
IOMMU Group 14 05:00.0 Ethernet controller [0200]: Intel Corporation I211 Gigabit Network Connection [8086:1539] (rev 03)
IOMMU Group 14 08:00.0 USB controller [0c03]: ASMedia Technology Inc. ASM2142 USB 3.1 Host Controller [1b21:2142]
IOMMU Group 15 09:00.0 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme II BCM57810 10 Gigabit Ethernet [14e4:168e] (rev 10)
IOMMU Group 15 09:00.1 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme II BCM57810 10 Gigabit Ethernet [14e4:168e] (rev 10)
IOMMU Group 16 0a:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT / Grenada XT [Radeon R9 290X/390X] [1002:67b0]
IOMMU Group 16 0a:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii HDMI Audio [Radeon R9 290/290X / 390/390X] [1002:aac8]
IOMMU Group 17 0b:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device [1022:145a]
IOMMU Group 18 0b:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Platform Security Processor [1022:1456]
IOMMU Group 19 0b:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) USB 3.0 Host Controller [1022:145c]
IOMMU Group 1 00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
IOMMU Group 20 0c:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device [1022:1455]
IOMMU Group 21 0c:00.2 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51)
IOMMU Group 22 0c:00.3 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) HD Audio Controller [1022:1457]
IOMMU Group 2 00:01.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
IOMMU Group 3 00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 4 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 5 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
IOMMU Group 6 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 7 00:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 8 00:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454]
IOMMU Group 9 00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge [1022:1452]

So I’m under the assumption IOMMU is working.

The GPU that I’m trying to pass-though is in Group 16 which only contains the GPU itself and it’s HDMI audio device. So unless there’s a conflicting audio driver in the way there’s nothing else in the group that should be stopping it.

Oh, are you not passing thru the HDMI too? Gotta do both. Otherwise here is one thread that may or may not help didn’t read it all -

I’ve been passing though both during my entire time of troubleshooting. Hasn’t made a difference. The thread you linked is for Proxmox so a lot of the troubleshooting tips people offered can’t be used since I’m using Ubuntu.

Still it’s the same error from qemu so I thought there might be something that might spark an idea. It’s too easy to say ‘it can’t be that’ when troubleshooting… and that bias can come back to bite you :wink:

I understand, but if we can see that group 16 only has 2 items and I’ve passed both through then unless I need to pass-though the slot itself (which is in a different group) then I don’t see how this is the fix. There’s nothing else to add.

Well, something’s not set right or IOMMU support on your system is spotty. /shrug. Like I said I’m no expert just giving ideas since you seemed to have run out of them.

Did you catch this note in the OP? Perhaps you got the numbers wrong. Hell with my bad eyes sometimes I can look at an identifier 20+ times and be adamant it’s correct, but the 21st time? Ohhh… oops… typo.

I do appreciate the idea/brainstorm triggering you’re trying to offer cause I’m fresh out of those.

Ah, yes. I did want to verify that I got those prefixes correct so I did check the /sys/bus/pci/devices directory to verify and this was the output for my 0a:00.0 & 0a:00.1 devices:

lrwxrwxrwx 1 root root 0 Jul 28 10:40 0000:0a:00.0 -> ../../../devices/pci0000:00/0000:00:03.1/0000:0a:00.0
lrwxrwxrwx 1 root root 0 Jul 28 10:40 0000:0a:00.1 -> ../../../devices/pci0000:00/0000:00:03.1/0000:0a:00.1

Assuming I’m reading these correctly all the prefixes are simply 0000.

Success!

So that BIOS option that I was hoping I really didn’t need? Yeah, I needed it.
What it was called:

Enumerate all IOMMU in IVRS
[Disable] Allows the IOMMU on the primary CPU die to map all device-visible virtual addresses.
[Enable] Enables the IOMMU on both CPU dies to map device-visible virtual addresses

So my best guess as to why it was causing my OS to hang on startup was because the instructions I followed the first time around wern’t appropriate for my setup. Following wendell’s guide for POP OS (even though that’s not what I’m using) yielded success when enabling it in the BIOS.

My best guess as to why this setting is necessary is because CPU die 2 (or 1 however you prefer to look at it) must be handling the slot the GPU is in that I want to pass-though. If I was trying to pass-though the GPU in the top slot (CPU 1 or 0) then I’m to assume the setting wouldn’t be needed.

Of course I’m probably completely wrong about all of that. :smiley:

Well, next step I’m going to re-install Windows because I installed it on an emulated disk and I found a virtio disk yield way better performance. After that, Looking Glass. And I get the feeling it’s going to be the toughest challenge yet to get working.

Congrats, I suppose I should try to make some progress on my own setup today. Though first I have another issue to fix before I even start the VFIO guide. That and I need to decide on a MB and get it ordered already. Not sure if I should mark this down as my bad luck or blame it on MSI but having bios issues (endless reboots after making changes), but it’s easy to make the connection when (according to tomshardware) MSI’s lineup of 400’s chipset MBs only have 16mb bios chips. When their competition is using 128-256mb chips you can’t help but wonder if they’re cutting corners…

edit Well, that further makes me think this board is ready to be RMA’d. No matter what I tried I couldn’t get it to load the grub menu despite there being a Grub UEFI entry at boot. I installed grub in --removable mode and now it works perfectly (well there are some errors but it boots so… lets ignore it for now :stuck_out_tongue: ).

If I can manage to get Looking Glass working I may write a Guide on this as well (on a different forum) for Ubuntu.

I’m under the assumption the choice of hardware isn’t terribly important so long as it supports VT-x/VT-d or the AMD equivalent SVM/(AMD-Vi?). I didn’t see that anywhere in my BIOS but I assume Enumerate all IOMMU in IVRS had something to do with it. Beyond this I assume anything should work. After that it’s all OS configuring.

It varies by bios but my X470 board does have an IOMMU setting. The choice of hardware doesn’t matter as much as it used to for sure. Back in the 970 chipset days ASRock was one of the few MB manufacturers that enabled IOMMU in their bios. Now it’s nearly everywhere. And yes, your IVRS almost certainly was the problem.

IOMMU groups can for the chipset devices can be an important factor for choosing a MB. Though it’s hard to verify this unless someone has posted theirs for the MB you’re considering. It’s probably more of a problem for non-TR builds? I’m guessing the TR platform has a lot more IOMMU groups considering the increase in available PCIe lanes.

I’m still pretty new to the TR platform but I’d say my IOMMU group list was pretty long. It kind of makes sense to allow the platform (X399) to use these functions since the number of cores and PCI_e lanes TR can come with would be optimal for a non-server platform implementation of this function.

Even with that in mind. I don’t know how many TR owners are doing this.

Well, derp. I had forgotten the last 16x slot in my X470 was PCIe 2.0. Guess I’d been looking at X570 boards too much. That chipset does the split as 8x/8/x4 4.0 when they have three 16x slots. So I had to play musical slots with my hardware and ended up with no place for my Renesas USB card. At least not in an isolated IOMMU group.

Then I noticed one of my USB buses was all by it’s lonesome but after looking into it… behind a PCI bridge and seemed to share the same device number as two others, just different functions. Mayhap that is the norm behind a bridge. Either way from what I see on the KVM wiki passthru of devices behind a bridge has to be all or not. So back to square one without a USB bus to dedicate to my VM.

edit Well, that was “fun”. I dunno if that was a gift from the June Win10 update (had to download the newest iso) but it did not like the default sound device virt-manager attached to the VM. Ich9 I think? Anyways, for a while I was thinking it was on the GPU side with it’s HDMI sound. All I knew is that the process that was pegging the processor at 100% had to do with 3rd party sound effects/processing (Windows Audio Device Graph Isolation). After spending almost an hour struggling to download drivers I gave up, shutdown the VM, and tried removing the qemu sound device. Fixed. Which is fine really as I had planned on using the HDMI audio anyways. I’m just surprised MS could screw up the support for such a staple device. Unless it’s something weird with my instance? Suppose that’s possible.

The good news is I got the GPU passed through fine. If anything that was the easiest part of this whole process for me. It was all the other details that took some time. i.e. setting up my zfs dataset with qcow2 in mind, getting grub-efi to work (in hindsight I should have just stuck to systemd), and of course getting all my cards slotted in the right place. Now I just have to do some testing to make sure everything is solid before I commit to loading everything into this win10 install. Still noticing some 100% cpu spikes as I type this, hopefully that will go away with the performance tweaks.

Perhaps you might know what’s going on here. I’ve moved on to performance optimization but when I try to add <memoryBacking><hugepages/></memoryBacking> to the VM config file with virsh it spits out am error message after I save/quit:

error: invalid argument: could not find capabilities for arch=x86_64 domaintype=kvm 
Failed. Try again? [y,n,i,f,?]: 

I didn’t notice anywhere wendell saying we had to use something other than x86_64 so I don’t know why I’m getting this error. Any ideas?

Other than making sure it’s enabled in the kernel like the preceding steps outline I’m not sure? Did you make sure you rebooted after making changes and then do a sanity check that the pages were what you thought they were?

cat /proc/meminfo  |grep Huge

The output of that command:

AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:   16384
HugePages_Free:    16384
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:        33554432 kB

Are you referring to the hint? When I checked the /etc/sysctl.conf file neither line starting with vm. were there so I added them. Outside of that I have no idea.

Also would you happen to know how to actually tie the CPU cores? He doesn’t explain it.