VFIO VM Crash on driver install Fedora 29 W10 RX580

Hello guys,
my problem is that when I try to install my AMD GPU drivers in my Windows 10 Guest it blackscreens and becomes unresponsive. After rebooting the Host I can get the VM running again, but when I shut the guest down I can’t restart it, without getting “no signal” on my monitor and/or complete system lockup.
I’ve read about a reinitialization bug on some AMD cards, but I haven’t found any topics with the card I am using, pretty much all said this card would be fine. I’ve tried several different BIOSs but nothing really helped.

PC Specs:
CPU: Ryzen 7 1700x
RAM: 16GB Flare X 3200C14
MB: Crosshair VI Hero BIOS 6401
Host GPU: Evga GTX 980 Superclocked
Guest GPU: Sapphire RX580 8GB Nitro+

Host setup:
Distro:
Fedora 29 Kernel 4.19.9-300.fc29.x86_64

Grub:

GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed ‘s, release .*$,g’ /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT=“console”
GRUB_CMDLINE_LINUX=“rd.driver.blacklist=nouveau modprobe.blacklist=nouveau nvidia-drm.modeset=1 resume=/dev/mapper/fedora-swap rd.lvm.lv=fedora/root rd.luks.uuid=luks-db6e47de-238c-48b3-94a6-7cdf32683d54 rd.lvm.lv=fedora/swap rhgb quiet”
GRUB_CMDLINE_LINUX_DEFAULT=“iommu=1 amd_iommu=on rd.driver.pre=vfio-pci”
GRUB_DISABLE_RECOVERY=“true”

vfio.conf:

options vfio-pci ids=10de:13c0,10de:0fbb

dracut conf:

add_drivers+="vfio vfio_iommu_type1 vfio_pci vfio_virqfd"

qemu config:

<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
  <name>W10-Passthrough</name>
  <uuid>f75472f1-956b-4989-975d-a89db2ae8744</uuid>
  <memory unit='KiB'>6291456</memory>
  <currentMemory unit='KiB'>6291456</currentMemory>
  <vcpu placement='static'>8</vcpu>
  <iothreads>4</iothreads>
  <cputune>
    <vcpupin vcpu='0' cpuset='0'/>
    <vcpupin vcpu='1' cpuset='1'/>
    <vcpupin vcpu='2' cpuset='2'/>
    <vcpupin vcpu='3' cpuset='3'/>
    <vcpupin vcpu='4' cpuset='4'/>
    <vcpupin vcpu='5' cpuset='5'/>
    <vcpupin vcpu='6' cpuset='6'/>
    <vcpupin vcpu='7' cpuset='7'/>
    <iothreadpin iothread='1' cpuset='0-1'/>
    <iothreadpin iothread='2' cpuset='2-3'/>
    <iothreadpin iothread='3' cpuset='4-5'/>
    <iothreadpin iothread='4' cpuset='6-7'/>
  </cputune>
  <os>
    <type arch='x86_64' machine='pc-i440fx-3.0'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/edk2/ovmf/OVMF_CODE.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/W10-Passthrough_VARS.fd</nvram>
    <bootmenu enable='yes'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
    </hyperv>
    <vmport state='off'/>
  </features>
  <cpu mode='custom' match='exact' check='partial'>
    <model fallback='allow'>Opteron_G3</model>
    <topology sockets='1' cores='8' threads='1'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
    <timer name='hypervclock' present='yes'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/W10-Passthrough.qcow2'/>
      <target dev='vda' bus='virtio'/>
      <boot order='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/home/dominik/os_images/Win10_1803_English_x64.iso'/>
      <target dev='sda' bus='sata'/>
      <readonly/>
      <boot order='1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/home/dominik/os_images/virtio-win-0.1.141.iso'/>
      <target dev='sdb' bus='sata'/>
      <readonly/>
      <boot order='3'/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x1'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'/>
    <controller type='sata' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </controller>
    <interface type='network'>
      <mac address='52:54:00:a0:8c:25'/>
      <source network='default'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <sound model='ich6'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
    </sound>
    <hostdev mode='subsystem' type='usb' managed='yes'>
      <source>
        <vendor id='0x0c70'/>
        <product id='0xf001'/>
      </source>
      <address type='usb' bus='0' port='3'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x0c' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x0c' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='yes'>
      <source>
        <vendor id='0x046d'/>
        <product id='0xc539'/>
      </source>
      <address type='usb' bus='0' port='1'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='yes'>
      <source>
        <vendor id='0x1b1c'/>
        <product id='0x1b13'/>
      </source>
      <address type='usb' bus='0' port='2'/>
    </hostdev>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </memballoon>
  </devices>
  <qemu:commandline>
    <qemu:arg value='-device'/>
    <qemu:arg value='ivshmem-plain,memdev=ivshmem,bus=pci.0'/>
    <qemu:arg value='-object'/>
    <qemu:arg value='memory-backend-file,id=ivshmem,share=on,mem-path=/dev/shm/looking-glass,size=32M'/>
    <qemu:arg value='-spice'/>
    <qemu:arg value='port=5900,addr=127.0.0.1,disable-ticketing'/>
  </qemu:commandline>
</domain>

It would be really great if anyone could help me because I just can’t find any solution to the problem.

Thanks in advance.

Greetings
4varus

Okay, I think there’s some confusion about the terms here. Host is the OS that’s running on Bare Metal (usually Linux) Guest is the OS that’s running in a VM. (probably Windows in this case)


This sounds like the reinitialization bug. You could try using different BIOS on the card.

1 Like

Oh yeah, I edited my post sorry about that.

I’ve tried a couple of different BIOS versions for the card so far, but there aren’t many gaming BIOS versions out there and sadly neither of the ones I tried fixed the problem…

You can try BIOS for any rx580.

Just override it with QEMU, rather than flashing it to the card. There are dozens out there.

If it’s not working, I think you’re SOL on the reset issue.

1 Like

Sounds great, do you maybe have any links on how to do this?

Edit:
I can restart the VM with a different BIOS through QEMU.

Sadly it still crashes when I install drivers…

Edit2:
It works sometimes, but not reliably

Did you figure this out yet? Your PC specs shows that the host GPU is an Nvidia and the guest GPU is an AMD, but your grub config is blacklisting the Nvidia driver instead of the AMD driver. That would be my guess. Run lspci -nnk, find your AMD card, and look at the kernel driver in use. It should show vfio-pci. If it shows Radeon, then it isn’t being blacklisted.

No, I haven’t resolved this. I will just use the RX580 as host GPU and the GTX980 as guest, which also has the benefit of not needing the proprietary NVIDIA drivers.

Nouveau was blacklisted because I was using the proprietary NVIDIA driver, so that was fine.
Also vfio-pci was binding the RX580 GPU and Sound, I had checked that multiple times.

I think Idajava is right, I have a quick question to confirm:
When you are booting into Fedora, do you see the Fedora logo on your AMD GPU screens? (You left the rhgb on)
dmesg should be of help.

No, I see it on my Nvidia GPU.