So I’ve been hoping to try some of the single GPU VFIO guides that have been floating around (mainly so I can abandon my work laptop to WFH). I’ve amended the scripts used in this video. I’ve changed the scripts to match my GPU and dekstop environment (Gnome).
I’m running Debian 10.7 and am using virt-manager to start the Windows 10 VM. My hardware is:
CPU: Ryzen 2700
Motherboard: Asus TUF GAMING X570-PRO (WI-FI) ATX AM4 Motherboard
GPU: Sapphire Radeon RX VEGA 56 8 GB Video Card (Reference, jet engine edition)
I think I’ve made some obvious error (e.g. probably some variable I’ve missed).
Starting the VM results in a black screen (though I can still SSH to the Debian host). Running “grep libvirtd /var/log/syslog” gives me the following:
Jan 22 16:38:32 userpc kernel: [ 49.160548] CPU: 15 PID: 1118 Comm: libvirtd Tainted: G W 4.19.0-13-amd64 #1 Debian 4.19.160-2
Jan 22 16:38:32 userpc kernel: [ 49.160894] CPU: 15 PID: 1118 Comm: libvirtd Tainted: G W 4.19.0-13-amd64 #1 Debian 4.19.160-2
Jan 22 16:40:11 userpc kernel: [ 13.654154] audit: type=1400 audit(1611333607.820:6): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/sbin/libvirtd" pid=807 comm="apparmor_parser"
Jan 22 16:40:11 userpc kernel: [ 13.654159] audit: type=1400 audit(1611333607.820:7): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/sbin/libvirtd//qemu_bridge_helper" pid=807 comm="apparmor_parser"
Jan 22 16:40:11 userpc libvirtd[1131]: libvirt version: 5.0.0, package: 4+deb10u1 (Guido Günther <[email protected]> Thu, 05 Dec 2019 00:22:14 +0100)
Jan 22 16:40:11 userpc libvirtd[1131]: hostname: userpc
Jan 22 16:40:11 userpc libvirtd[1131]: Non-executable hook script /etc/libvirt/hooks/qemu
Binary file /var/log/syslog matches
When I saw “Non-executable hook script /etc/libvirt/hooks/qemu”, I assumed I had just not set that file and the two scripts it triggers as executable. But I had done.
So I tried to SSH into my Debian host and run the “/bin/vega-vfio-startup.sh” script manually, which resulted in:
modprobe: FATAL: Module amdgpu is in use
At which point the script hangs. Isn’t that the point of the script though? I know it’s in use which is why I’m unloading the amdgpu module so it’s no longer in use so that the VM can use it?
Here’s the qemu file in /etc/libvirt/hooks/qemu that triggers to scripts:
#!/bin/sh
# Script for Windows_10_Pro
if [[ $1 == "Windows_10_Pro" ]]; then
if [[ $2 == "prepare" ]]; then
/bin/vega-vfio-startup.sh
fi
if [[ $2 == "release" ]]; then
/bin/vega-vfio-teardown.sh
fi
fi
Here is /bin/vega-vfio-startup.sh:
#!/bin/bash
# Helpful to read output when debugging
set -x
# Stop display manager
systemctl stop gdm.service
# Unbind VTconsoles
echo 0 > /sys/class/vtconsole/vtcon0/bind
# Unbind EFI-Framebuffer
echo efi-framebuffer.0 > /sys/bus/platform/drivers/efi-framebuffer/unbind
sleep 5
# Unload AMD drivers
modprobe -r amdgpu
# Unbind the GPU from display driver
virsh nodedev-detach pci_0000_0b_00_0
virsh nodedev-detach pci_0000_0b_00_1
# Load VFIO kernel module
modprobe vfio-pci
Here is the /bin/vega-vfio-teardown.sh script:
#!/bin/bash
set -x
# Unload VFIO-PCI Kernel Driver
modprobe -r vfio-pci
modprobe -r vfio_iommu_type1
modprobe -r vfio
# Re-Bind GPU to AMD Driver
virsh nodedev-reattach pci_0000_0b_00_1
virsh nodedev-reattach pci_0000_0b_00_0
# Rebind VT consoles
echo 1 > /sys/class/vtconsole/vtcon0/bind
# Re-Bind EFI-Framebuffer
echo "efi-framebuffer.0" > /sys/bus/platform/drivers/efi-framebuffer/bind
#Load amd driver
modprobe amdgpu
# Restart Display Manager
systemctl start gdm.service
Here’s the XML of my VM:
<!--
WARNING: THIS IS AN AUTO-GENERATED FILE. CHANGES TO IT ARE LIKELY TO BE
OVERWRITTEN AND LOST. Changes to this xml configuration should be made using:
virsh edit Windows_10_Pro
or other application using the libvirt API.
-->
<domain type='kvm'>
<name>Windows_10_Pro</name>
<uuid>75ab6ccc-91dd-426c-9a21-5ed5e05cc37f</uuid>
<metadata>
<libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
<libosinfo:os id="http://microsoft.com/win/10"/>
</libosinfo:libosinfo>
</metadata>
<memory unit='KiB'>8388608</memory>
<currentMemory unit='KiB'>8388608</currentMemory>
<vcpu placement='static'>14</vcpu>
<os>
<type arch='x86_64' machine='pc-q35-3.1'>hvm</type>
<boot dev='hd'/>
</os>
<features>
<acpi/>
<apic/>
<hyperv>
<relaxed state='on'/>
<vapic state='on'/>
<spinlocks state='on' retries='8191'/>
</hyperv>
<vmport state='off'/>
</features>
<cpu mode='host-model' check='partial'>
<model fallback='allow'/>
</cpu>
<clock offset='localtime'>
<timer name='rtc' tickpolicy='catchup'/>
<timer name='pit' tickpolicy='delay'/>
<timer name='hpet' present='no'/>
<timer name='hypervclock' present='yes'/>
</clock>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>destroy</on_crash>
<pm>
<suspend-to-mem enabled='no'/>
<suspend-to-disk enabled='no'/>
</pm>
<devices>
<emulator>/usr/bin/qemu-system-x86_64</emulator>
<disk type='file' device='cdrom'>
<driver name='qemu' type='raw'/>
<source file='/home/user/Downloads/Win10_20H2_v2_English_x64.iso'/>
<target dev='sdb' bus='sata'/>
<readonly/>
<address type='drive' controller='0' bus='0' target='0' unit='1'/>
</disk>
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/home/user/Media/4TB Internal/KVM Images/Windows_10_Pro.qcow2'/>
<target dev='sdc' bus='sata'/>
<address type='drive' controller='0' bus='0' target='0' unit='2'/>
</disk>
<controller type='usb' index='0' model='qemu-xhci' ports='15'>
<address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
</controller>
<controller type='sata' index='0'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
</controller>
<controller type='pci' index='0' model='pcie-root'/>
<controller type='pci' index='1' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='1' port='0x10'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
</controller>
<controller type='pci' index='2' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='2' port='0x11'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
</controller>
<controller type='pci' index='3' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='3' port='0x12'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/>
</controller>
<controller type='pci' index='4' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='4' port='0x13'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/>
</controller>
<controller type='pci' index='5' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='5' port='0x14'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x4'/>
</controller>
<interface type='network'>
<mac address='52:54:00:45:65:25'/>
<source network='default'/>
<model type='e1000e'/>
<address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
</interface>
<serial type='pty'>
<target type='isa-serial' port='0'>
<model name='isa-serial'/>
</target>
</serial>
<console type='pty'>
<target type='serial' port='0'/>
</console>
<input type='tablet' bus='usb'>
<address type='usb' bus='0' port='1'/>
</input>
<input type='mouse' bus='ps2'/>
<input type='keyboard' bus='ps2'/>
<sound model='ich9'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x1b' function='0x0'/>
</sound>
<hostdev mode='subsystem' type='pci' managed='yes'>
<source>
<address domain='0x0000' bus='0x0b' slot='0x00' function='0x0'/>
</source>
<address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
</hostdev>
<hostdev mode='subsystem' type='pci' managed='yes'>
<source>
<address domain='0x0000' bus='0x0b' slot='0x00' function='0x1'/>
</source>
<address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
</hostdev>
<redirdev bus='usb' type='spicevmc'>
<address type='usb' bus='0' port='2'/>
</redirdev>
<redirdev bus='usb' type='spicevmc'>
<address type='usb' bus='0' port='3'/>
</redirdev>
<memballoon model='virtio'>
<address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
</memballoon>
</devices>
</domain>
Feel like I’ve not set something really obvious, but not sure what.