RTX 3070 fans don't engage when vfio driver is in use and Windows VM is not running?

I just got an RTX 3070 that I’m using to passthrough to a Windows 10 guest. It replaced a GTX 1060 that was working fine in this role. And while the passthrough seems to work just fine with the new card, it seems that when the Windows guest VM is not running, and the RTX 3070 is just idling with the vfio-pci driver enabled, it gets very hot and the fans don’t run.

I have no idea why this card gets hot doing nothing (if Windows is off and no other programs are using the card, and it’s not driving any displays, it shouldn’t be doing anything that makes heat.

On top of that, the fans do not run. I tried the Q->P switch and it has no effect. The heat sink on the card is so hot (doing nothing!) that I can’t keep my finger on it for more than a few seconds, so it’s probably around 85 C. When I boot the Windows VM and the nvidia driver engages through Windows, the fans start up and cool the card.

How can I keep this card from getting so hot when the Windows guest is off, when it is literally doing nothing?

The host OS is Debian 11.

Sounds like maybe the driver initialises the card and turns off the VBIOS fan control.

When you shut down the Windows VM nothing is controlling the fans any more, and Linux can’t access it as it is passed through.

Do the fans spin up before the first time you load the Windows VM or they never work whether or not you’ve loaded Windows since cold boot or not?

Hmmm, maybe run a unbind script / rebind when stopping / starting the VM?

I’ve heard of it the other way around, where the fans don’t spin when the VM is running. But not when the Vm is off. Strange

The computer was off last night. So this morning, I did a cold boot. I did not start the Windows VM at all, just let the host OS run while feeding the dog, checking email, etc. The 3070 warmed up slowly over the course of an hour. (It’s not being used to display anything, and the vfio driver is attached to it.)

After about 90 minutes, the heat sink was hot to the touch, I can keep my finger on it for about twenty seconds before it is too hot to stand. The fans never came on.

When I turn on the VM, as soon as the Tiano Core logo disappears, just before the Windows login screen appears, the fans start spinning and remain on until the card cools.

I honestly don’t know much about the vfio driver, except that its supposed to be a “dummy” driver so the card only initializes once when the VM boots (since the cards don’t like having the driver changed after first initialization).

Is vfio supposed to control the cooling fans? Should it be reported as a bug to the vfio developers?

Should the card have a sense of self preservation and run fans, even if there is no driver telling them to run? Should this be reported to ROG, the card’s manufacturer?

How can I narrow down if:

  1. The card is not running the fans because vfio is telling it not to run the fans

or if:

  1. The card never runs the fans unless a driver explicitly instructs it to. (ie. it has no default behavior in absence of driver control).

It seems that if the fans don’t run without a driver telling it to, then the card could overheat just by hitting delete at boot and sitting on the BIOS setup screen for too long.

I downloaded ASUS’s GPU Tweak II on Windows and I can set custom fan curves for the card. Unfortunately, it seems that this tool controls the fans directly from Windows. I can “save a profile” in their software, but I wasn’t able to find any option that writes my changes to the card itself. So it does nothing when the program is closed.

Is there a tool that lets me write fan curves to the card, the same way that you can set fan curves on a motherboard that doesn’t care about OS or drivers?

Do graphics cards normally/historically have fan control on the card itself, regardless of driver? Or do they always require an OS layer driver to tell the fans to run?

Update: The fans do work under just Linux when I installed the 470 version Nvidia driver manually from Nvidia’s .run script and that driver is bound at boot.

The fans don’t seem to work under the 460 version Nvidia driver that is the current Debian package version for nvidia-driver. (The 470 driver also allowed the card to show up in Blender for CUDA and Optix, where the packaged version does not).

But. It seems that there is still an issue with the card when the vfio driver is bound and Windows is not running. I can’t leave it in that state or it will overheat. Can the firmware on these cards be updated?

I’m going to counter that by saying that even 55C can feel hot to the touch, so it might not be running that hot in the first place.

From what I’ve seen, most modern cards have a silent fan setting on by default, which means that fans will never come on unless you have a significant load on the GPU. This means that the GPU heatsink itself will become warm, even if it is idling and not doing much.

The only way to get an accurate assessment would be to use a thermal camera or a probe that can accurately measure the temperatures while the GPU is idling and not bound to a VM.