VFIO Won't hand off GPU to VM

I’ve have been hitting my head against the wall trying to figure out why my GPU won’t hand off to a VM under Virt-manager. A little Background:

Setup:
Linux Mint 18.3
Asrock x399 Fatal1ty
32 GB Corsair LPX
Host GPU: Gigabyte RX550
Guest GPU: Sapphire RX580 Pulse 8GB

I have followed many of the available tutorials and finally got my system to stub out my RX580 during early kernel module load. I can confirm that when the system boots and I run lspci -nnk that my RX580 and the associated audio device are in the same group and they are both using the VFIO-PCI drivers.

When I go to start a Windows 7 VM in Virt-manager, I receive no errors and the system graph appears to ramp up and stay around the 25% range. When I run an lspci on the host after starting the VM, it appears that the audio device is using the hda-intel audio driver and no longer listed as using the VFIO-PCI driver, However, the GPU is still listed as “kernel module in use = VFIO-PCI”. The separate monitor that I have plugged into the guest GPU never even flickers or shows any signs of life. Also the performance on my host system tanks pretty hard and the mouse sometimes becomes jittery and unresponsive. An attempt to shut down brings up error output referencing AMD-vi errors and will not completely shut down unless I physically press and hold the power button on my case.

Additionally, I did patch my kernel 4.15.7 (latest stable one at the time of writing) with the tr reset bug patch (I believe it’s the one @gnif released) to see if it helped, however, I’m still experiencing the issue the same as described above.

Any suggestions?

If I remember correctly VFIO passthrough only works when the guest OS VM is using UEFI/OVMF and not BIOS. Since you’re using Windows 7 for the guest it’s pretty probable that the VM is setup to use BIOS. I also found that using the Q35 chipset helps too.

I have no idea why the audio portion is switching to the default driver, though. May get switched for some reason when the VM tries to use it and can’t. Not sure.

As marasm says you need to be running a UEFI setup rather than BIOS otherwise you will likely be dumped into UEFI shell and it will give you a bunch of volumes, none of which are bootable. I had this issue on my own machine which I solved by using the “Duel” boot method and having windows already installed in a dual boot configuration. This way I was able to reluctantly install the creators update and using windows’ conversion tool to make my MBR into a GPT and successfully boot in those same drives as allocated to a VM. You can find some relevant info on this HERE

Also you will note that the chipset I use is not Q35 and is i440FX. A shorthand way to remember is Q35 for Linux VMs, i440fx is for Windows VM unless you are having trouble.

In my opinion your issue is somewhat different, as your output display is not actually displaying anything and your guest VM causes your host to go mental on shutdown. So I don’t know if converting from MBR to GPT would fix it but it would be worth a shot, it may fix whether or not the VM displays anything on boot though.

Welllllll kinda. It is far easier to use a UEFI GPU but it is possible with Q35 and an older GPU. A UEFI card is more useful if you are using your iGPU for the host as you won’t have to use the VGA arbiter patch.

Thanks everyone for the replys. This is where I currently stand:

  1. I have tried all combinations of bios and chipset. Still experiencing the issue.

  2. I have tried duel booting and can successfully boot the VM without passing through the GPU. The hard drive with Win7 installed is passed through as a bare metal install and boots to a spice client display no problem. But still no GPU initialization.

  3. I know that my card supports UEFI boot so that shouldn’t be an issue.

  4. And No iGPU due to being on the Threadripper platform.

  5. I have also tried removing all spice and video QXL virtual devices from the VM. Still no GPU initialization.

  6. I also don’t believe this matters but the way I have the guest GPU connected to my secondary monitor is through DVI(It’s an older monitor I’m using to try and set this up.) The only other thought I had was that it wasn’t relaying EDID information the way a dummy plug would to wake up the card. Don’t really know how that works.

  7. I don’t know if it’s relevant, however, since I implemented the VFIO stub driver for my guest GPU my boot time BIOS Flash Screen, and UEFI interface are at a pretty small resolution. Meaning that the Flash Screen and UEFI are huge on screen like 800x600 or something.

  8. Additionally, I don’t know if this helps but I get an error when the system boots (right before my disk password prompt) that says some problem loading all early kernel modules. After I enter my disk password though, it boots to the normal lock screen and the resolution is correct. Could this have anything to do with the problem. It doesn’t seem like it should since the VFIO driver is grabbing the card during boot.

Sorry I’m just throwing all of this out there without any rhyme or reason. I just wanted to document as many of the steps I’ve tried and issues I keep experiencing. I’m on well over 40 hours of trying different settings, configurations, etc. Nothing has even flickered my secondary monitor.

Actually, your situation sounds very similar to what I battled this weekend. No matter what I did the VM I created wouldn’t show anything. The CPU graph would be steady at about 25%. I checked my working setup on my desktop and noticed it was using a different OVMF type.

I used this to install the correct OVMF version and my VM worked immediately: https://fedoraproject.org/wiki/Using_UEFI_with_QEMU I added the nightly repo.

Not sure how you would do it on Mint, though.

Hi There, same issue
I’m trying the following.
I’ve downloaded a bios for a similar ( probably why I’m no further forward ) GPU from Techpowerup.
These bios have a header in them and you have to delete everything above
UªyëK7400éL.wÌVIDEO … .<…IBM VGA Compatible…RT01/18/17

then save it to a new file.

I’ve then added the BIOS to the XML file with vrish edit
it will look like this

   <hostdev mode='subsystem' type='pci' managed='yes'>
       <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x41' slot='0x00' function='0x0'/>
      </source>
      **_<rom file='/home/jackf/1080ti.rom'/>_**Add this line or similar
      <address type='pci' domain='0x0000' bus='0x02' slot='0x03' function='0x0'/>
    </hostdev>

where my device I pass through is 0000:41:00:0

I’ve then had to add the path to the rom file to app Armour
nano /etc/apparmor.d/abstractions/libvirt-qemu
add a line like /home/jackf/1080ti.rom r,

restart app-amour or reboot and try again. I no longer get a message booting in blind mode in the gui session as I try and start my guest ubuntu

However I have no display output

My next step will be to try and extract the card bios

I’m not sure about your other errors re amd-vi. I’ve enabled a few things in the bios and tomoorow I’ll make a list of what i turned on, from memory there are some ACS settings in a sub menu IIRC it was under AMD CBS and the settings are undocumented in my manual.

I found the right bios for my card and still no joy.

Edit 2
May help the system going ‘glitchy/slow’ : Enable ACS
I’m on a gigabyte board but you don’t say you have enabled this in bios so here is a rough guide to try and find the ACS settings
In my system its in peripherals > AMD CBS > NBIO Common options and then you may find ACS

Small post screen:
Screen resolution. One of my screens has a scaler processor in it that can be disabled for better response /less lag the pixels still switch at the same speed but there is less in the signal path between the GPU and the screen. I can turn this off and when I do my post screens are tiny.

This video may help regarding bios pass to GPU from KVM :

I solved my issue, I used the java fix from Redit for resets
Have you tried the patch by Gnif?
https://forum.level1techs.com/t/threadripper-reset-fixes/123937

Its a tidier solution but the java fix is easier.
Some fix options here.

I also updated my OVMF code and vars as suggested by marasm
I extracted them from a file called edk2.git-ovmf-x64-0-20180309.b3435.g7548947d04.noarch.rpm from https://www.kraxel.org/repos/jenkins/edk2/

I’m using the
OVMF-pure-efi.fd
if memory serves.