I decided to bump this for a few reasons.
Meanwhile I tried everything I could to get it to work - except switching the 1030GT into slot 1 and the guest GPU into slot 2.
Here’s my VM configuration:
<domain type="kvm">
<name>vm1</name>
<uuid>13ca15d0-86e4-43db-a2b6-61c502b38743</uuid>
<memory unit="KiB">49325056</memory>
<currentMemory unit="KiB">49325056</currentMemory>
<memoryBacking>
<nosharepages/>
</memoryBacking>
<vcpu placement="static" current="4">16</vcpu>
<os>
<type arch="x86_64" machine="pc-q35-4.2">hvm</type>
<loader readonly="yes" type="pflash">/usr/share/OVMF/OVMF_CODE.fd</loader>
<nvram>/var/lib/libvirt/qemu/nvram/vm1_VARS.fd</nvram>
<bootmenu enable="yes"/>
</os>
<features>
<acpi/>
<apic/>
<hyperv>
<relaxed state="on"/>
<vapic state="on"/>
<spinlocks state="on" retries="8191"/>
<vendor_id state="on" value="1234567890ab"/>
</hyperv>
<vmport state="off"/>
<ioapic driver="kvm"/>
</features>
<cpu mode="host-passthrough" check="none">
<topology sockets="1" cores="8" threads="2"/>
<cache mode="passthrough"/>
<feature policy="require" name="topoext"/>
</cpu>
<clock offset="localtime">
<timer name="hpet" present="no"/>
<timer name="hypervclock" present="yes"/>
</clock>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>destroy</on_crash>
<pm>
<suspend-to-mem enabled="no"/>
<suspend-to-disk enabled="no"/>
</pm>
<devices>
<emulator>/usr/bin/qemu-system-x86_64</emulator>
<controller type="usb" index="0" model="qemu-xhci">
<address type="pci" domain="0x0000" bus="0x03" slot="0x00" function="0x0"/>
</controller>
<controller type="sata" index="0">
<address type="pci" domain="0x0000" bus="0x00" slot="0x1f" function="0x2"/>
</controller>
<controller type="pci" index="0" model="pcie-root"/>
<controller type="pci" index="1" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="1" port="0x8"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x0" multifunction="on"/>
</controller>
<controller type="pci" index="2" model="pcie-to-pci-bridge">
<model name="pcie-pci-bridge"/>
<address type="pci" domain="0x0000" bus="0x01" slot="0x00" function="0x0"/>
</controller>
<controller type="pci" index="3" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="3" port="0x9"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x1"/>
</controller>
<controller type="pci" index="4" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="4" port="0xa"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x2"/>
</controller>
<controller type="pci" index="5" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="5" port="0xb"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x3"/>
</controller>
<controller type="pci" index="6" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="6" port="0xc"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x4"/>
</controller>
<controller type="pci" index="7" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="7" port="0xd"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x5"/>
</controller>
<controller type="pci" index="8" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="8" port="0xe"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x6"/>
</controller>
<controller type="pci" index="9" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="9" port="0xf"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x7"/>
</controller>
<interface type="bridge">
<mac address="52:54:00:6b:0a:b0"/>
<source bridge="virbr0"/>
<model type="e1000"/>
<address type="pci" domain="0x0000" bus="0x02" slot="0x01" function="0x0"/>
</interface>
<input type="mouse" bus="ps2"/>
<input type="keyboard" bus="ps2"/>
<hostdev mode="subsystem" type="pci" managed="yes">
<driver name="vfio"/>
<source>
<address domain="0x0000" bus="0x01" slot="0x00" function="0x0"/>
</source>
<boot order="1"/>
<address type="pci" domain="0x0000" bus="0x05" slot="0x00" function="0x0"/>
</hostdev>
<hostdev mode="subsystem" type="pci" managed="yes">
<source>
<address domain="0x0000" bus="0x08" slot="0x00" function="0x0"/>
</source>
<rom file="/usr/share/vgabios/GA102_edited.rom"/>
<address type="pci" domain="0x0000" bus="0x08" slot="0x00" function="0x0" multifunction="on"/>
</hostdev>
<hostdev mode="subsystem" type="pci" managed="yes">
<source>
<address domain="0x0000" bus="0x08" slot="0x00" function="0x1"/>
</source>
<address type="pci" domain="0x0000" bus="0x09" slot="0x00" function="0x0"/>
</hostdev>
<memballoon model="virtio">
<address type="pci" domain="0x0000" bus="0x06" slot="0x00" function="0x0"/>
</memballoon>
</devices>
</domain>
I dumped the vBIOS of my guest GPU (the RTX3090) and altered it, cutting away the apparently unnecessaryx header.
I added a few parameters to GRUB_CMDLINE_LINUX_DEFAULT,
so now this line looks like this:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash amd_iommu=on iommu=pt vfio-pci.ids=10de:2204,10de:1aef vfio_iommu_type1.allow_unsafe_interrupts=1 ignore_msrs=1 video=vesafb:off,efifb:off fbcon=map:1"
I created VMs with Q35 and i440FX “chipsets”, but it does not make a difference.
If I exclude just the video adapter (0000:08:00.0
), but leave the audio adapter in place (0000:08:00.1
), the system seems to run. I don’t get to see the image, because I removed anything related to Spice/QXL/Cirrus Logic output, but in the performance tab it continuously does something and at least keeps showing CPU usage > 0% and Network I/O > 0%.
If I add the video adapter, it just shows CPU usage for about 3 to 4 ticks > 0% and then drops to 0%.
Also: the system boots up just fine without the guest GPU (and with Cirrus Logic video output instead). I can see the Windows UI.
I have passed through the whole PCI host device for my NVMe drive - and it works.
lspci -nnv
shows the guest GPU (0000:08:00.0
and 0000:08:00.1
) as using Kernel driver in use: vfio-pci
. It does say Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
though and I am not sure if this is correct and if not, what I should do about that.
I have spent at least a 20 hours in total getting PCIe-passthrough to work, but I am close to giving up . Please help me fix it - any ideas on what to try to make it work, are welcome.
Edit: the /var/log/libvirt/qemu/vm1.log
looks like this (for one go):
2021-05-09 08:45:46.159+0000: starting up libvirt version: 6.0.0, package: 0ubuntu8.8 (Victor Manuel Tapia King <[email protected]> Fri, 19 Feb 2021 17:15:56 +0100), qemu version: 4.2.1Debian 1:4.2-3ubuntu6.15, kernel: 5.8.0-50-generic, hostname: divstar-pc-ubuntu
LC_ALL=C \
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin \
HOME=/var/lib/libvirt/qemu/domain-1-vm1 \
XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-1-vm1/.local/share \
XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-1-vm1/.cache \
XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-1-vm1/.config \
QEMU_AUDIO_DRV=none \
/usr/bin/qemu-system-x86_64 \
-name guest=vm1,debug-threads=on \
-S \
-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-vm1/master-key.aes \
-blockdev '{"driver":"file","filename":"/usr/share/OVMF/OVMF_CODE.fd","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"}' \
-blockdev '{"driver":"file","filename":"/var/lib/libvirt/qemu/nvram/vm1_VARS.fd","node-name":"libvirt-pflash1-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-pflash1-format","read-only":false,"driver":"raw","file":"libvirt-pflash1-storage"}' \
-machine pc-q35-4.2,accel=kvm,usb=off,vmport=off,dump-guest-core=off,mem-merge=off,kernel_irqchip=on,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format \
-cpu host,topoext=on,hv-time,hv-relaxed,hv-vapic,hv-spinlocks=0x1fff,hv-vendor-id=1234567890ab,host-cache-info=on,l3-cache=off \
-m 48169 \
-overcommit mem-lock=off \
-smp 4,maxcpus=16,sockets=1,cores=8,threads=2 \
-uuid 13ca15d0-86e4-43db-a2b6-61c502b38743 \
-display none \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=31,server,nowait \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=localtime \
-no-hpet \
-no-shutdown \
-global ICH9-LPC.disable_s3=1 \
-global ICH9-LPC.disable_s4=1 \
-boot menu=on,strict=on \
-device pcie-root-port,port=0x8,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x1 \
-device pcie-pci-bridge,id=pci.2,bus=pci.1,addr=0x0 \
-device pcie-root-port,port=0x9,chassis=3,id=pci.3,bus=pcie.0,addr=0x1.0x1 \
-device pcie-root-port,port=0xa,chassis=4,id=pci.4,bus=pcie.0,addr=0x1.0x2 \
-device pcie-root-port,port=0xb,chassis=5,id=pci.5,bus=pcie.0,addr=0x1.0x3 \
-device pcie-root-port,port=0xc,chassis=6,id=pci.6,bus=pcie.0,addr=0x1.0x4 \
-device pcie-root-port,port=0xd,chassis=7,id=pci.7,bus=pcie.0,addr=0x1.0x5 \
-device pcie-root-port,port=0xe,chassis=8,id=pci.8,bus=pcie.0,addr=0x1.0x6 \
-device pcie-root-port,port=0xf,chassis=9,id=pci.9,bus=pcie.0,addr=0x1.0x7 \
-device qemu-xhci,id=usb,bus=pci.3,addr=0x0 \
-netdev tap,fd=33,id=hostnet0 \
-device e1000,netdev=hostnet0,id=net0,mac=52:54:00:6b:0a:b0,bus=pci.2,addr=0x1 \
-device vfio-pci,host=0000:01:00.0,id=hostdev0,bootindex=1,bus=pci.5,addr=0x0 \
-device vfio-pci,host=0000:08:00.0,id=hostdev1,bus=pci.8,multifunction=on,addr=0x0,romfile=/usr/share/vgabios/GA102_edited.rom \
-device vfio-pci,host=0000:08:00.1,id=hostdev2,bus=pci.9,addr=0x0 \
-device virtio-balloon-pci,id=balloon0,bus=pci.6,addr=0x0 \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on
2021-05-09 08:45:46.159+0000: Domain id=1 is tainted: host-cpu
2021-05-09T08:46:20.716676Z qemu-system-x86_64: terminating on signal 15 from pid 998 (/usr/sbin/libvirtd)
2021-05-09 08:46:22.527+0000: shutting down, reason=destroyed
I have no clue what to do, because it does not even state, that it e.g. cannot read/write to/from the PCIe device or something. I am not sure the tainted CPU message is responsible for the problem, because when I remove the video adapter, this message still appears.
Furthermore if I remove the video adapter, I seem to be able to gracefully shutdown the VM (reason=shutdown
) while if I add it, my only choice to turn off the VM is by force (reason=destroyed
).
I cannot seem to get past the black screen and get the OS to boot. However: the OS boots just fine without the GPU in the VM and with the GPU by restarting the host and chosing the Windows drive.
Update
So after some more tingling, it seems that the reason was a BIOS setting. In particular there was one saying, that if I want to boot legacy devices, I should disable it. I’ll be trying to figure out which one it exactly was, but I got the VM with Windows and the card booted up!