GPU Passthrough on Arch Linux - BSOD on driver install (Radeon 7970)

I've been trying to set up GPU passthrough for I don't even know how long anymore... I bough a non-K CPU and seems like I got everything working like it should except that I get no video output from the secondary GPU. Both the VGA and HDMI Audio devices are bound to vfio-pci and appear to get passed to Windows VM.

lspci -nnk

01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Tahiti XT [Radeon HD 7970/8970 OEM / R9 280X] [1002:6798]
	Subsystem: Gigabyte Technology Co., Ltd Device [1458:254d]
	Kernel driver in use: vfio-pci
	Kernel modules: radeon
01:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Tahiti XT HDMI Audio [Radeon HD 7970 Series] [1002:aaa0]
	Subsystem: Gigabyte Technology Co., Ltd Device [1458:aaa0]
	Kernel driver in use: vfio-pci
	Kernel modules: snd_hda_intel

PCI grouping seems clean:

find /sys/kernel/iommu_groups/ -type l

/sys/kernel/iommu_groups/0/devices/0000:00:00.0
/sys/kernel/iommu_groups/1/devices/0000:00:01.0
/sys/kernel/iommu_groups/1/devices/0000:01:00.0
/sys/kernel/iommu_groups/1/devices/0000:01:00.1

0000:00:01.0 is a bridge device which I believe should not be claimed by vfio-pci and 0000:00:00.0 can be ignored.

System:
i7 3770
Gigabyte 7970 (updated to latest firmware)
Arch Linux with linux-vfio-lts kernel

I used virt-manager to install win10 64bit passing vga and audio devices (no other changes) and could boot fine. In Device manager I see 2x "Basic Microsoft Display Driver" devices. Second has yellow warning and shows error 31 in properties. The spec for second device in GPU-Z matches 7970 GPU. When attempting to install either catalyst or crimson drivers I get BSOD. I also tried win7 64bit but both driver installers crashed. Installing via Device manager's update drivers worked but win7 now crashes during boot (never any video output).

Does anyone have any suggestions at all?

1 Like

I've had this issue a few times, it is almost always driver related, my suggestion would be to try a older driver maybe even the driver disk that came with your 7970, I know this goes against modern thinking that newer drivers are better because they are in most cases, but I had to go back a few generations on my R9-270 to get a initial stable install and confirm that the card was passed without any host binding going on, once I got the card recognized correctly in Win 7 then I moved up the driver releases until I got to the crimson drivers which caused me all kinds of grief.

I'm running the latest crimson drivers now and seem to be fine but the first few releases caused BSODs, crashes, and failure to load the OS.....

You can find older drivers here...

http://www.guru3d.com/files/index.html

Will give it a shot now. All tutorials I read actually don't mention the need for drivers at all for video output. I was trying to start qemu is a basic command and should have gotten command line output on the secondary GPU according to the write up. Does that sound correct? Seems like I should be able to get some output without drivers.

I couldn't get any output on the screen of the monitor connected to the passed through video card until the card was recognized by Windows and the drivers loaded for it, it kinda' through me in a panic at first because the OS would load in the console view in vert-manager but the monitor connected to the passed card stayed black with no signal, after the card was recognized and driver loaded the console view only shows the "starting windows" splash screen and stays that way as long as the guest is running.....it has been that way for every KVM build I have done (8 of them now) so i'd guess it is correct and normal.

(of course after loading the drivers the guest OS has to be shut down and rebooted, I'd suggest shutting down instead of restarted so you get a clean reboot)

BTW I'm running this on Fedora 23 but I don't really think it would make any real difference.

One other thing that might help you understand what is going on is that first you have black listed the 7970 so Linux doesn't see the card and can't access it, Windows sees the device but has no clue what it is (cardwise) so won't touch the card defaulting to it's generic driver, the generic driver/card being loaded will display in QEMU/vert-manager because the console view is the default display, the same thing will happen if you boot Windows running in a KVM into safe mode it will not display on the passed through GPU but will display and load in the console view in vert-manager/QEMU.

I know this doesn't seem 100% right but it is the way I have found it to work and others have confirmed the same behavior on their systems, I'd guess it's not mentioned because it is the common thing that happens, at least that is my opinion.

All the links on http://www.guru3d.com/ are dead. I used the original cd (2010) on win7 with same results. Installer crashes and if installed via device manager windows BSOD's at boot. Has to be something else, plenty of people used 7970 and haven't read anyone having issues with drivers.

Before I tried virt-manager I used this script to test:

#!/bin/bash
cp /usr/share/edk2.git/ovmf-x64/OVMF_VARS-pure-efi.fd /tmp/my_vars.fd
qemu-system-x86_64 \
-enable-kvm \
-m 2048 \
-cpu host,kvm=off \
-vga none \
-device vfio-pci,host=01:00.0,multifunction=on \
-device vfio-pci,host=00:01.0 \

-device vfio-pci,host=01:00.1 \
-drive if=pflash,format=raw,readonly,file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd \
-drive if=pflash,format=raw,file=/tmp/my_vars.fd

This is from https://bufferoverflow.io/gpu-passthrough/
No output even though the author stated there should be and it's also on Arch.

When creating a new VM in virt-manager in the Overview tab there is a "Firmware:" option with BIOs selected as default while UEFI is greyed out (not found). Should it be set to BIOS?

I don't see anything wrong with the script, but I'll be honest I'm not a expert on doing this much less a expert on Linux, hopefully someone will come along that has more info than I can provide.

We might try to pull @mythicalcreature into the post by quoting him, he may be able to help you.

are you by chance using a second video card for the host?

if so swap them around so that the host card is in the first pcie lane, if you're using pci-stub you'll want to undo that first.

If you mean a second dedicated GPU then no, I am using integrated Intel graphics on the host. There is only one pci slot on the motherboard (ITX). Also I am not using pci-stub, I used vfio-pci instead. I assume since the vga is bound properly it doesn't matter what method was used.

does "lspci -k -s 01:" report that vfio-pci is the driver?

Sure looks like it:

 [[email protected] ~]$ lspci -k -s 01:
    01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tahiti XT [Radeon HD 7970/8970 OEM / R9 280X]
    	Subsystem: Gigabyte Technology Co., Ltd Device 254d
    	Kernel driver in use: vfio-pci
    	Kernel modules: radeon
    01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Tahiti XT HDMI Audio [Radeon HD 7970 Series]
    	Subsystem: Gigabyte Technology Co., Ltd Device aaa0
    	Kernel driver in use: vfio-pci
    	Kernel modules: snd_hda_intel

hmm.... which kernel are you using exactly? (uname -sr)

Linux 4.1.20-1-lts

Was using standard linux-lts kernel then installed patched kernel - linux-vfio-lts from AUR but still no change. Didn't really need patches since the pci groups were clean but tried anyways out of desperation :)

can you update/change to the linux-vfio (non lts) or standard arch kernel instead and try? it might totally be a waste, but afaik vfio wouldve needed to be backported into 4.1 to work, which might've caused issues somewhere.

the only other thing i can think of atm is just to try different versions of the amd driver till something works.

can you paste the libvirt config for your vm?
should be something like virsh dumpxml --inactive --security-info (name of vm) or "virsh edit (name of vm)" which will open it in a text editor

Will try standard non-lts kernel next. What you mean by needs to be backported to work?

sudo virsh edit win7

<domain type='kvm'>
  <name>win7</name>
  <uuid>7238d1e8-561f-48f7-a09f-6dfb35d2e4b8</uuid>
  <memory unit='KiB'>2457600</memory>
  <currentMemory unit='KiB'>2457600</currentMemory>
  <vcpu placement='static'>6</vcpu>
  <os>
    <type arch='x86_64' machine='pc-i440fx-2.5'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
    </hyperv>
    <vmport state='off'/>
  </features>
  <cpu mode='custom' match='exact'>
    <model fallback='allow'>IvyBridge</model>
  </cpu>
  <clock offset='localtime'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
    <timer name='hypervclock' present='yes'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>
    <emulator>/usr/sbin/qemu-system-x86_64</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/win7.qcow2'/>
      <target dev='hda' bus='ide'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/home/dom/Desktop/win7.iso'/>
      <target dev='hdb' bus='ide'/>
      <readonly/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x1'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'/>
    <controller type='ide' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </controller>
<interface type='direct'>
      <mac address='52:54:00:87:66:eb'/>
      <source dev='enp3s0' mode='bridge'/>
      <model type='rtl8139'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <channel type='spicevmc'>
      <target type='virtio' name='com.redhat.spice.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='tablet' bus='usb'/>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <graphics type='spice' autoport='yes'>
      <image compression='off'/>
    </graphics>
    <sound model='ich6'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </sound>
    <video>
      <model type='vga' vram='16384' heads='1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x01' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    </hostdev>
    <redirdev bus='usb' type='spicevmc'>
    </redirdev>
    <redirdev bus='usb' type='spicevmc'>
    </redirdev>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
    </memballoon>
  </devices>
</domain>

I'm fairly sure vfio-pci wasn't introduced until 4.2, though the lts kernel was probably updated with it later.

From Arch Linux Wiki:

vfio-pci is available in kernel v4.1+, and is the recommended option if your kernel supports it. You can check if this module is available by running:
$ modprobe vfio-pci
If there is no output, you're good to go. If instead you receive modprobe: FATAL: Module vfio-pci not found, use the guide further down for pci-stub instead.

I was using linux-lts because vfio passthrough was broken in the non-lts linux kernel, though it's been reported that it is fixed in 4.4+ kernel.

Anything of note in libvirt the config?

xD sorry then, I've never had any issues with the standard kernel.

as for the config, I cant spot anything for ovmf, are you absolutely sure this was setup as uefi? it's possible i just overlooked it.

How would one set up as UEFI? I did wonder about the BIOS / UEFI setting in virt-manager:

Thing is that I did everything relating to the VM in GUI as the command line method failed me as well so I am not sure how I would do anything with ovmf from the GUI.

ahh sorry, must've read over that. you have the change it to ovmf before the first boot. so like at the end of the setup select customize before install or something

http://vfio.blogspot.com/2015/05/vfio-gpu-how-to-series-part-4-our-first.html (here's a guide that i fancy)

edit: he also goes over setting up cpuset and hugepages which will improve vcpu performance and overhead a bit.

So it should be set to UEFI and no BIOS, makes sense. It's greyed out for me (not found). It doesn't mention where ovmf is coming from, where do I define ovmf location?