QEMU: VFIO Nvidia driver switch

Continuing the discussion from USB 3.0 controller PCIe passthrough:

I am still having problems to switch my GPU between the vfio-pci and the nvidia driver.

For clarity I’ll briefly describe my setup:
Hardware wise I have an X470 mainboard, a Ryzen 5 2600 and two GPUs (AMD Radeon RX 580 8GB Red Devil by PowerColor and Nvidia GTX 980 Ti 6GB Gaming G1 by Gigabyte). I have a freshly installed Fedora 30 running as my host OS and a Windows 10 VM as a KVM guest with PCIe passthrough setup for my Nvidia GPU.

I removed the device ID of the Nvidia GPU from my /etc/modprobe.d/vfio.conf file because I want the nvidiadriver to control the fans while my Windows VM is turned off. And after that I installed the nvidiadriver (version 430.40) by following this guide.
Side note: Before the installation both GPUs outputted a picture to the attached monitors. After the installation my system booted into command line but after deleting the xorg.conf the xserver was starting again and my Nvidia GPU stopped outputting a picture. I didn’t care about it at this point because I don’t want to use the Nvidia GPU on my host system anyway.

Fan control works fine with the nvidia driver loaded but obviously I need to switch back to the vfio-pci driver before starting my VM and switch back to the nvidia driver after shutting the VM down.
In order to do so I prepared a QEMU hook script which can be found here.

The hook script is getting triggered as expected but I am not able to unbind the nvidia driver. libvirtd freezes as it reaches this point of the script.

echo "0000:09:00.0" > /sys/bus/pci/devices/0000:09:00.0/driver/unbind

If I try to run this line manually in a Terminal as root the Terminal freezes as well.

I am not an experienced Linux user but my assumption is that the nvidia driver is busy and that’s why it can’t be unbound.
Does anyone have a clue on why that is?

Also: I don’t feel well with the fact that my Nvidia GPU stopped outputting a picture without me specifically configuring it this way.
How would I do this if I intentionally want my NVidia GPU to stop outputting a picture?
Is there a modprobe option for the nvidia driver like there is for vfio-pci?

options vfio-pci disable_vga=1

Or do I have to disable it via xorg.conf?

Thanks in advance.

Xorg.conf calls the Nvidia driver to do modesetting. You have to fully deactivate modesetting before you start QEMU to unbind it. Same procedure as installing the proprietary driver, where you have to change your runlevel to ensure modesetting isn’t triggered.

This is basically a problem if you’re running a desktop environment. You might have to start your VM from a different TTY via CLI.

1 Like

Changing the runmode to 3 does work.

systemctl set-default multi-user.target

It allows my driver switch to operate properly. But this runlevel does break the fan control of the nvidia driver and manually setting the fan speed isn’t possible either because nvidia-settings tells me that it can’t connect.
It seems that I need to start an xserver in order to resurrect my fan control.

But with an xserver running I am not able to unbind the nvidia driver if I want to start my Windows VM.
BTW: I had my xorg.conf generated by nvidia-xconfig:

sudo nvidia-xconfig -a --allow-empty-initial-configuration --use-display-device=None --virtual=1920x1080 --cool-bits=4 --busid= PCI:10:0:0

I fiddled a bit around the last few weeks and found some solutions to Nvidia fan control in a VFIO setup.

So just for the sake of documenting my findings I will briefly list a few options:

  1. Setup a dummy VM with the most bare-bone OS you can find that is still able to run the correct proprietary NVIDIA driver for your GPU and tell it to boot on host startup. You will obviously have to shut the dummy VM down before you can use the GPU in another VM. (Inefficient but the only solution I got working without any hardware modifications)
  2. Hook your GPU fans up to a programmable power switch and tell QEMU to trigger it accordingly by setting up hook script. (Fast and reliable but kind of janky)
  3. Hook your GPU fans to your motherboard or a manual PWM controller. (I got the idea from colleague of mine who has an open water-cooling loop)
  4. Buy a different GPU. I got my hands on a used 1080 Ti Strix from ASUS and this very card doesn’t spin its fans up to a 100% when its not under the control of a proprietary driver.
    It has a semi passive fan control behaviour and per default its fans are turned off.

Sidenote:
As a result of option 4 I suspect that the fan control issues with my Gigabyte 980 Ti are vBIOS related.
Might it be possible to identify a vBIOS of another brand which also comes with semi passive fan control and flash onto my Gigabyte card?

Watch out flashing other VBIOSes especially on Maxwell and later. Different vendors use different GDDR chips and different display output configs, and most Nvidia GPUs are single BIOS, not dual BIOS.