Asus ROG Strix x570 e-gaming - Host GPU in 3rd PCIe_x16 slot, Guest GPU in 1st PCIe_x16 slot?

Hello,

I have an Asus ROG Strix x570 e-gaming mainboard and an AMD Ryzen 5900X. I also bought an Asus TUF RTX3090, which I plan on using via PCIe-passthrough.

My current problem is the following: the mainboard has 3 PCIe_x16-slots. The first is the fastest one and I have the RTX3090 in there. In order to let the blower-style coolers have enough air, I decided to put my Asus GT1030, which I am planning on using for Linux, into the third PCIe_x16 slot (I don’t really care much for visual performance on Linux, but I was desperate for driving my 5120x2160@60hz monitor using DisplayPort 1.4 - which does work with the GT1030 just fine).

The card somewhat works if I do not attach a monitor to the main GPU.

In the BIOS I am have a “GPU Post” feature and a “Bus interface” combobox, which allows me to chose which PCIe_x16-port I want to use as GPU. However: it seems to not get saved with CSM disabled - it always boots with PCIe-x16_1 (the RTX3090) and completely hangs at boot-up (showing the Asus ROG log) if I add vfio.pci.ids of the RTX3090 (there are 2 entries in the IOMMU group of this GPU and I added both into /etc/default/grub, followed by sudo update-grub) and select “Ubuntu” from the Grub-menu.

Does anyone have an idea of what I should do? Do I really have to place the 1030GT into the second - or worse: the first - slot? I’d like to avoid that.

I decided to bump this for a few reasons.

Meanwhile I tried everything I could to get it to work - except switching the 1030GT into slot 1 and the guest GPU into slot 2.

Here’s my VM configuration:

<domain type="kvm">
  <name>vm1</name>
  <uuid>13ca15d0-86e4-43db-a2b6-61c502b38743</uuid>
  <memory unit="KiB">49325056</memory>
  <currentMemory unit="KiB">49325056</currentMemory>
  <memoryBacking>
    <nosharepages/>
  </memoryBacking>
  <vcpu placement="static" current="4">16</vcpu>
  <os>
    <type arch="x86_64" machine="pc-q35-4.2">hvm</type>
    <loader readonly="yes" type="pflash">/usr/share/OVMF/OVMF_CODE.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/vm1_VARS.fd</nvram>
    <bootmenu enable="yes"/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state="on"/>
      <vapic state="on"/>
      <spinlocks state="on" retries="8191"/>
      <vendor_id state="on" value="1234567890ab"/>
    </hyperv>
    <vmport state="off"/>
    <ioapic driver="kvm"/>
  </features>
  <cpu mode="host-passthrough" check="none">
    <topology sockets="1" cores="8" threads="2"/>
    <cache mode="passthrough"/>
    <feature policy="require" name="topoext"/>
  </cpu>
  <clock offset="localtime">
    <timer name="hpet" present="no"/>
    <timer name="hypervclock" present="yes"/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled="no"/>
    <suspend-to-disk enabled="no"/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <controller type="usb" index="0" model="qemu-xhci">
      <address type="pci" domain="0x0000" bus="0x03" slot="0x00" function="0x0"/>
    </controller>
    <controller type="sata" index="0">
      <address type="pci" domain="0x0000" bus="0x00" slot="0x1f" function="0x2"/>
    </controller>
    <controller type="pci" index="0" model="pcie-root"/>
    <controller type="pci" index="1" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="1" port="0x8"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x0" multifunction="on"/>
    </controller>
    <controller type="pci" index="2" model="pcie-to-pci-bridge">
      <model name="pcie-pci-bridge"/>
      <address type="pci" domain="0x0000" bus="0x01" slot="0x00" function="0x0"/>
    </controller>
    <controller type="pci" index="3" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="3" port="0x9"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x1"/>
    </controller>
    <controller type="pci" index="4" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="4" port="0xa"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x2"/>
    </controller>
    <controller type="pci" index="5" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="5" port="0xb"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x3"/>
    </controller>
    <controller type="pci" index="6" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="6" port="0xc"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x4"/>
    </controller>
    <controller type="pci" index="7" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="7" port="0xd"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x5"/>
    </controller>
    <controller type="pci" index="8" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="8" port="0xe"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x6"/>
    </controller>
    <controller type="pci" index="9" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="9" port="0xf"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x7"/>
    </controller>
    <interface type="bridge">
      <mac address="52:54:00:6b:0a:b0"/>
      <source bridge="virbr0"/>
      <model type="e1000"/>
      <address type="pci" domain="0x0000" bus="0x02" slot="0x01" function="0x0"/>
    </interface>
    <input type="mouse" bus="ps2"/>
    <input type="keyboard" bus="ps2"/>
    <hostdev mode="subsystem" type="pci" managed="yes">
      <driver name="vfio"/>
      <source>
        <address domain="0x0000" bus="0x01" slot="0x00" function="0x0"/>
      </source>
      <boot order="1"/>
      <address type="pci" domain="0x0000" bus="0x05" slot="0x00" function="0x0"/>
    </hostdev>
    <hostdev mode="subsystem" type="pci" managed="yes">
      <source>
        <address domain="0x0000" bus="0x08" slot="0x00" function="0x0"/>
      </source>
      <rom file="/usr/share/vgabios/GA102_edited.rom"/>
      <address type="pci" domain="0x0000" bus="0x08" slot="0x00" function="0x0" multifunction="on"/>
    </hostdev>
    <hostdev mode="subsystem" type="pci" managed="yes">
      <source>
        <address domain="0x0000" bus="0x08" slot="0x00" function="0x1"/>
      </source>
      <address type="pci" domain="0x0000" bus="0x09" slot="0x00" function="0x0"/>
    </hostdev>
    <memballoon model="virtio">
      <address type="pci" domain="0x0000" bus="0x06" slot="0x00" function="0x0"/>
    </memballoon>
  </devices>
</domain>

I dumped the vBIOS of my guest GPU (the RTX3090) and altered it, cutting away the apparently unnecessaryx header.

I added a few parameters to GRUB_CMDLINE_LINUX_DEFAULT, so now this line looks like this:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash amd_iommu=on iommu=pt vfio-pci.ids=10de:2204,10de:1aef vfio_iommu_type1.allow_unsafe_interrupts=1 ignore_msrs=1 video=vesafb:off,efifb:off fbcon=map:1"

I created VMs with Q35 and i440FX “chipsets”, but it does not make a difference.

If I exclude just the video adapter (0000:08:00.0), but leave the audio adapter in place (0000:08:00.1), the system seems to run. I don’t get to see the image, because I removed anything related to Spice/QXL/Cirrus Logic output, but in the performance tab it continuously does something and at least keeps showing CPU usage > 0% and Network I/O > 0%.
If I add the video adapter, it just shows CPU usage for about 3 to 4 ticks > 0% and then drops to 0%.

Also: the system boots up just fine without the guest GPU (and with Cirrus Logic video output instead). I can see the Windows UI.

I have passed through the whole PCI host device for my NVMe drive - and it works.

lspci -nnv shows the guest GPU (0000:08:00.0 and 0000:08:00.1) as using Kernel driver in use: vfio-pci. It does say Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia though and I am not sure if this is correct and if not, what I should do about that.

I have spent at least a 20 hours in total getting PCIe-passthrough to work, but I am close to giving up :frowning:. Please help me fix it - any ideas on what to try to make it work, are welcome.

Edit: the /var/log/libvirt/qemu/vm1.log looks like this (for one go):

2021-05-09 08:45:46.159+0000: starting up libvirt version: 6.0.0, package: 0ubuntu8.8 (Victor Manuel Tapia King <[email protected]> Fri, 19 Feb 2021 17:15:56 +0100), qemu version: 4.2.1Debian 1:4.2-3ubuntu6.15, kernel: 5.8.0-50-generic, hostname: divstar-pc-ubuntu
LC_ALL=C \
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin \
HOME=/var/lib/libvirt/qemu/domain-1-vm1 \
XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-1-vm1/.local/share \
XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-1-vm1/.cache \
XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-1-vm1/.config \
QEMU_AUDIO_DRV=none \
/usr/bin/qemu-system-x86_64 \
-name guest=vm1,debug-threads=on \
-S \
-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-vm1/master-key.aes \
-blockdev '{"driver":"file","filename":"/usr/share/OVMF/OVMF_CODE.fd","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"}' \
-blockdev '{"driver":"file","filename":"/var/lib/libvirt/qemu/nvram/vm1_VARS.fd","node-name":"libvirt-pflash1-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-pflash1-format","read-only":false,"driver":"raw","file":"libvirt-pflash1-storage"}' \
-machine pc-q35-4.2,accel=kvm,usb=off,vmport=off,dump-guest-core=off,mem-merge=off,kernel_irqchip=on,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format \
-cpu host,topoext=on,hv-time,hv-relaxed,hv-vapic,hv-spinlocks=0x1fff,hv-vendor-id=1234567890ab,host-cache-info=on,l3-cache=off \
-m 48169 \
-overcommit mem-lock=off \
-smp 4,maxcpus=16,sockets=1,cores=8,threads=2 \
-uuid 13ca15d0-86e4-43db-a2b6-61c502b38743 \
-display none \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=31,server,nowait \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=localtime \
-no-hpet \
-no-shutdown \
-global ICH9-LPC.disable_s3=1 \
-global ICH9-LPC.disable_s4=1 \
-boot menu=on,strict=on \
-device pcie-root-port,port=0x8,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x1 \
-device pcie-pci-bridge,id=pci.2,bus=pci.1,addr=0x0 \
-device pcie-root-port,port=0x9,chassis=3,id=pci.3,bus=pcie.0,addr=0x1.0x1 \
-device pcie-root-port,port=0xa,chassis=4,id=pci.4,bus=pcie.0,addr=0x1.0x2 \
-device pcie-root-port,port=0xb,chassis=5,id=pci.5,bus=pcie.0,addr=0x1.0x3 \
-device pcie-root-port,port=0xc,chassis=6,id=pci.6,bus=pcie.0,addr=0x1.0x4 \
-device pcie-root-port,port=0xd,chassis=7,id=pci.7,bus=pcie.0,addr=0x1.0x5 \
-device pcie-root-port,port=0xe,chassis=8,id=pci.8,bus=pcie.0,addr=0x1.0x6 \
-device pcie-root-port,port=0xf,chassis=9,id=pci.9,bus=pcie.0,addr=0x1.0x7 \
-device qemu-xhci,id=usb,bus=pci.3,addr=0x0 \
-netdev tap,fd=33,id=hostnet0 \
-device e1000,netdev=hostnet0,id=net0,mac=52:54:00:6b:0a:b0,bus=pci.2,addr=0x1 \
-device vfio-pci,host=0000:01:00.0,id=hostdev0,bootindex=1,bus=pci.5,addr=0x0 \
-device vfio-pci,host=0000:08:00.0,id=hostdev1,bus=pci.8,multifunction=on,addr=0x0,romfile=/usr/share/vgabios/GA102_edited.rom \
-device vfio-pci,host=0000:08:00.1,id=hostdev2,bus=pci.9,addr=0x0 \
-device virtio-balloon-pci,id=balloon0,bus=pci.6,addr=0x0 \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on
2021-05-09 08:45:46.159+0000: Domain id=1 is tainted: host-cpu
2021-05-09T08:46:20.716676Z qemu-system-x86_64: terminating on signal 15 from pid 998 (/usr/sbin/libvirtd)
2021-05-09 08:46:22.527+0000: shutting down, reason=destroyed

I have no clue what to do, because it does not even state, that it e.g. cannot read/write to/from the PCIe device or something. I am not sure the tainted CPU message is responsible for the problem, because when I remove the video adapter, this message still appears.
Furthermore if I remove the video adapter, I seem to be able to gracefully shutdown the VM (reason=shutdown) while if I add it, my only choice to turn off the VM is by force (reason=destroyed).

I cannot seem to get past the black screen and get the OS to boot. However: the OS boots just fine without the GPU in the VM and with the GPU by restarting the host and chosing the Windows drive.

Update
So after some more tingling, it seems that the reason was a BIOS setting. In particular there was one saying, that if I want to boot legacy devices, I should disable it. I’ll be trying to figure out which one it exactly was, but I got the VM with Windows and the card booted up!

Disable csm, enable above 4G decoding, and enable SAM/CAM/resizable bar support 3090 has a lot of bar space

It seems, that turning off Above 4G Decoding (and Resizable BAR, because I cannot enable one without the other) did the trick.
However this very Windows install starts with both these settings well on bare metal. Any other ideas?

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.