I recently noticed that my GPU fans run full speed if VM is not running. After a quick search I found that this is an expected behavior. vfio-pci is just a stub driver and not supposed to perform any device specific function (that is fan control on a GPU).
So I am looking for an alternative solution. Right now the idea is to switch the device control to the nouveau driver (I have a GTX card) if it is not in use. Before running a VM - switch it back to vfio-pci.
I’ve seen some posts that people do this with AMD cards.
I wrote a script loosely based on https://pastebin.com/zLQPHPQk
Switching from vfio to nouveau works fine (I boot up with the vfio by default). The fans are quiet.
But as soon as I try to reclaim the device back for vfio - the nouveau driver crashes with a NULL pointer dereference.
I do not have much of a software answer, I would also be interested in a solution. On the previous iteration of my PC I went full MacGyver and controlled the fans with an arduino board and a python script that would start the fans when the VM was runing. Not very elegant, I know.
I am having the same problem and I got the driver switch working by using the following script as a QEMU hook:
#!/bin/bash
TMP_FILE=/tmp/qemu-hook
DOMAIN=$1
VM_STATE=$2
touch $TMP_FILE
echo "Event $DOMAIN $VM_STATE" >> $TMP_FILE
# Attaches a PCIe device to the given driver
# Args:
# $1 - Bus number
# For example: "0000:03:00.0"
# $2 - Vendor ID
# For example: "0x10de"
# $3 - Product ID
# For example: "0x17c8"
# $4 - Driver to attach
# For example: "vfio-pci"
attach_driver()
{
if [ $# -eq 4 ]; then
echo "Attach driver: $1 $2 $3 $4"
fi
if [ $# -eq 3 ]; then
echo "Attach driver: $1 $2 $3"
fi
if [ -d /sys/bus/pci/devices/$1/driver/ ]; then
echo "$1" > /sys/bus/pci/devices/$1/driver/unbind
fi
if [ $# -eq 4 ]; then
echo "$2 $3" > /sys/bus/pci/drivers/$4/new_id
fi
}
# Script for win10
if [[ $DOMAIN == "win10" ]]; then
if [[ $VM_STATE == "prepare" ]]; then
echo "Windows 10 VM preparing PCIe devices" >> $TMP_FILE
attach_driver "0000:09:00.0" "0x10de" "0x17c8" "vfio-pci" # GTX 980 Ti video
attach_driver "0000:09:00.1" "0x10de" "0x36b6" "vfio-pci" # GTX 980 Ti HDMI sound
fi
if [[ $VM_STATE == "release" ]]; then
echo "Windows 10 VM releasing PCIe devices" >> $TMP_FILE
attach_driver "0000:09:00.0" "0x10de" "0x17c8" "nouveau" # GTX 980 Ti video
attach_driver "0000:09:00.1" "0x10de" "0x36b6" # GTX 980 Ti HDMI sound
fi
fi
To get the hook working just copy it into /etc/libvirt/hooks/, name it qemu and make it executable.
Unfortunately the nouveau driver isn’t able to control the fans of my GPU (Nvidia GTX 980 Ti).
However, the proprietary nvidia driver is able to control the fan speeds but the driver switch does not work with this one because nvidia is not present in the /sys/bus/pci/drivers/ folder.
I also tried to use driver_override but that did not work either.
Does anybody know how to reassign the proprietary nvidia driver properly?
Some people told me that I should try to override the vBIOS of the card with one that has a manipulated section for fan control. https://pve.proxmox.com/wiki/Pci_passthrough#romfile
The thing is that I don’t know how to manipulate a vBIOS.
Are there more options to achieve reasonable fan control or even shut the fans down completely while the VM is turned off?