Documenting my re-immersion with passthrough

Hi all,

Been a while since I’ve done passthrough, and I’m running into some troubles with it, so I thought I’d document it here so others can learn or be helped by it.

Today’s struggle is getting my 6900xt to smoothly pass into a VM. Currently, I suspect OVMF is being a special snowflake. I got it to boot smoothly once, but now it’s not booting properly.

OS: Arch
Kernel: linux-vfio (aur)

Tweaks:

  • ACS override. multifunction,downstream
  • kvm.ignore_msrs=1

Current iteration of the xml:


<domain type='kvm'>
  <name>win10-gpu</name>
  <uuid>22ed55d2-47c0-4ac3-97a9-9e31228498fd</uuid>
  <metadata>
    <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
      <libosinfo:os id="http://microsoft.com/win/10"/>
    </libosinfo:libosinfo>
  </metadata>
  <memory unit='KiB'>33554432</memory>
  <currentMemory unit='KiB'>33554432</currentMemory>
  <vcpu placement='static'>16</vcpu>
  <os>
    <type arch='x86_64' machine='pc-q35-5.2'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/ovmf/ovmf_code_x64.bin</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/win10-gpu_VARS.fd</nvram>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
    </hyperv>
    <vmport state='off'/>
  </features>
  <cpu mode='host-model' check='partial'/>
  <clock offset='localtime'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
    <timer name='hypervclock' present='yes'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/win10-gpu.qcow2'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/home/sarge/Downloads/Win10_20H2_v2_English_x64.iso'/>
      <target dev='sdb' bus='sata'/>
      <readonly/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/usr/share/virtio/virtio-win.iso'/>
      <target dev='sdc' bus='sata'/>
      <readonly/>
      <address type='drive' controller='0' bus='0' target='0' unit='2'/>
    </disk>
    <controller type='usb' index='0' model='qemu-xhci' ports='15'>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </controller>
    <controller type='sata' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'/>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </controller>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x10'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='2' port='0x11'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0x12'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0x13'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0x14'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x4'/>
    </controller>
    <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0x15'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x5'/>
    </controller>
    <controller type='pci' index='7' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='7' port='0x16'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x6'/>
    </controller>
    <controller type='pci' index='8' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='8' port='0x17'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x7'/>
    </controller>
    <controller type='pci' index='9' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='9' port='0x18'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='10' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='10' port='0x19'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x1'/>
    </controller>
    <interface type='network'>
      <mac address='52:54:00:99:5c:86'/>
      <source network='default'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <channel type='spicevmc'>
      <target type='virtio' name='com.redhat.spice.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='tablet' bus='usb'>
      <address type='usb' bus='0' port='1'/>
    </input>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <graphics type='spice' autoport='yes'>
      <listen type='address'/>
      <image compression='off'/>
    </graphics>
    <sound model='ich9'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1b' function='0x0'/>
    </sound>
    <video>
      <model type='qxl' ram='65536' vram='65536' vgamem='16384' heads='1' primary='yes'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
    </video>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x09' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x09' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x09' slot='0x00' function='0x2'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x09' slot='0x00' function='0x3'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x0'/>
    </hostdev>
    <redirdev bus='usb' type='spicevmc'>
      <address type='usb' bus='0' port='2'/>
    </redirdev>
    <redirdev bus='usb' type='spicevmc'>
      <address type='usb' bus='0' port='3'/>
    </redirdev>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x09' slot='0x00' function='0x0'/>
    </memballoon>
  </devices>
</domain>

So what happens is when I turn on the VM, it hangs with 1 CPU pinned for a good 5-10 minutes, before extremely slowly booting, causing full system lockups and jittering as Windows is booting.

Keep in mind, this is not an optimized VM yet. I’m just trying to get it to boot and not be a broken POS at the moment.

4 Likes

Hmmm, ignoring msrs didn’t help the stuttering.

going from 16 threads to 1 and 32GB to 8 didn’t help either.

I wonder if there’s some tomfoolery going on with running 2 AMD GPUs in the same IOMMU group, but then invalidating that for purposes of passthrough.

1 Like

Welp, it’s still stuttering.

I’m reading that people have had issues with virtio drivers, so I’m going to remove the virtio components and see if that fixes it. I really hope I don’t have to pass the sata controller through.

No dice. :confused: what the fuck!

Wait, what’s this?

Apr 30 19:08:41 monolith libvirtd[4576]: unable to open '/sys/fs/cgroup/machine.slice/machine-qemu\x2d11\x2dwin10\x2dgpu.scope/': No such file or directory
Apr 30 19:08:41 monolith libvirtd[4576]: Failed to remove cgroup for win10-gpu

Welp, that’s new.

1 Like

Welp, switched back to BIOS, as opposed to OVMF and suddenly this bitch is booting.

Why is OVMF always broken?

It’s been like this for 3 years now.

1 Like

So it’s looking more like this issue is somehow related to interrupts. I’m able to boot, but the second the GPU driver gets loaded by windows, the system starts to lock up. I may have seen this before, but am ignorant of the solution.

Anyone who’s got advice on how to proceed is very welcome to toss in suggestions at this point.

1 Like

With the risk of bringing outdated advice, as I haven’t passed through any AMD card newer than 7790 (and that was in 2017): Do you need primary passthrough?

For me AMD always worked well as secondary passthrough, i.e. without OVMF. (is that what your “switching back to BIOS” refers to?). Primary passthrough (with OVMF) was always not-recommended for AMD in the past, I’m not sure if it was supposed to work at all.

Even with secondary passthrough you should be able to make Windows switch display to the passed-through GPU immediately after boot.

Hmm, so is this after you stopped using OVMF? If so I’m out of advice…

I’m on an ITX build and the gpus won’t fit the other way around. The 6900x is on a waterblock and the wx7100 (host gpu) is air cooled. There would be no airflow for the wx if I swapped them.


Yes, I got it working-ish with BIOS configuration, but at the moment, it seems that the minute that I start actually interacting with the GPU, it chokes the whole host system and I need to stop the VM.

It appears that it doesn’t matter if it’s using OVMF or not.


So let me expound on this a bit. It seems that OVMF tries to initialize the GPU, and that causes it to choke, no matter what happens. Using seabios instead of OVMF solves the failure to POST problem, however, when windows tries to initialize the GPU, it starts to choke the host again.

1 Like

Ok, I see, so it seems the card fails whenever it gets initialized in the VM, regardless of how it comes to that point.

I might be using the terminology wrong, with “primary” I ment simply using OVMF as opposed to Seabios. Anyway it doesn’t seem to matter for your problem.

Have you tried getting the wx7100 through? Just to rule out platform-related problems. (if that’s possible without switching slots.)

2 Likes

I can definitely try that, it was on my list of things to attempt today.

1 Like

So I noticed something strange in my lspci, I wonder if it’s got some thing to do with it.

06:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev c0)
07:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch
08:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] (rev c0)
08:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device ab28
08:00.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 73a6
08:00.3 Serial bus controller [0c80]: Advanced Micro Devices, Inc. [AMD/ATI] Device 73a4
09:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon Pro WX 7100]
09:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590]

Anyone know what the 06 and 07 entries are? “Upstream port” and “Downstream port” is a new situation for me. This only appears with the bifurcated risers. When the 6900xt is in the system alone with the non-bifurcated riser, these entries do not exist.

They’re both using the pcieport driver.

1 Like

Possibly has something to do with the crossfire being bridgeless and being baked into the PCIE slot. And gets exposed in a riser card…

In fact that’s the only thing that comes to mind for a upstream and a downstream connection on a switch.

1 Like

Could be. I’m wondering if there’s some P2P PCIe communication happening between the 6900xt and wx7100 that’s causing problems. They were originally in the same IOMMU group.


My suspicion is that it’s for P2P communication, but I really don’t know what it is or if it can be disabled.


I wonder if stubbing those ports would solve my issues.

2 Likes

I think it should. Was a work around during the 6990 days to use them as 6970’s in a vm

2 Likes

welp, let’s see what happens.

1 Like

Alright, both pci-stub and vfio-pci won’t bind to them. Just sticks with the pcieport driver.

If I remove the “downstream port” entry, it also completely removes the 6900xt, so that’s incredibly strange.

Not a clue how to proceed lol

Attempting to pass the ports themselves through results in “non-endpoint pci devices cannot be assigned to guests” so that’s clearly not helpful.

1 Like

Okay, so I installed gnif’s vendor-reset driver and now the vm starts up properly, but I get error 43 from the GPU drivers. :confused:

I stand corrected. My host just hard reset by my motherboards hardware watchdog. I’ve never seen that before.

1 Like

Rebooting with wx7100 vfio’d, let’s see what happens here.

vfio driver took the card, testing a VM boot now.

Well, the “basic display adapter” threw an error in the VM, let’s see what the radeon pro drivers do.

Well, the wx7100 works. :frowning:

I think the only real option for testing, going forward, is to test swapping the slots.

3 Likes

Now I am stumped as to why the 6900xt gave you code 43.

2 Likes

Same here dude. I’m ready to give up and just focus my efforts on Proton.

Which is upsetting because this was a $450 lesson.


Alright, let’s try changing the hardware configuration.

2 Likes

Only thing left to do. Lol as slot 2 err slot 1 is the card that is passed through with no big problem.