2070 Super fan speed stuck at 100% in guest

Hi guys, I am having some problems with GPU fan speed. I’m current running Fedora 31 headless, with 2 win10 guests.

I’ve recently upgraded one of the GPU from a RX570 Pulse to a 2070 Super Windforce OC 3X. The fan doesn’t spin when the GPU is below 60C. But once it hits 60C the fan ramps up all the way to 100% while GPU-Z shows it at 40% and 0 RPM. The fan will not spin down after that until i shutdown the guest.

Some troubleshooting steps I’ve taken, all to no avail:

  • Uninstall all drivers using DDU and installing the latest drivers.
  • Using Afterburner to adjust the fan curves (this also ramps up the fan to 100% when trying to adjust the speed)
  • Using the card on another VM image.

The GPU is perfectly on bare metal in another machine. Fan speed can be controlled using Afterburner, RPM readings are fine. The 1080Ti was running without any issues all this time. It’s the 2070 Super that’s giving me a headache :sweat_smile:

PC Specs :
2700X, X470 Prime Pro
Corsair Vengeance 64GB 3000MHz running at 2666MHz
Galax 1080Ti EXOC
Gigabyte 2070 Super Windforce OC 3X
Seasonic Prime Gold 850W

Judging by your comments regarding “guests” and Afterburner - is this 2070 passed through to the guest Windows OS?

If so… its maybe possible(?) that the sensors for it aren’t passed through to the VM?

If the driver can’t see the relevant sensors, it (the card) may be going into full speed fan mode as a fail safe because the driver inside the guest can’t see the sensors or control the fan controller if the relevant hardware components aren’t passed through.

i.e., the card is acting as if it would without driver control, from a thermal perspective, and that may be fairly over the top in ensuring the card doesn’t die with excessive fan speed.

Yes, the 2070S is being passed through to the guest Windows OS. I’m not at the machine right now but it has the GPU, Audio, 1 USB-C controller (Why is this here even though there’re no USB C ports on the card?) and 1 other device that I can’t remember off my head. All 4 were being passed through to the VM.

So I’m not sure what else is needed though, seeing that the 1080Ti is fine on the Windows guest. I can post the IOMMU groupings later when I get back.

I’m not sure either, just figuring thats indicative of the behaviour most stuff does if a sensor fails or can’t otherwise be read.

Hopefully if you post the IOMMU for the 2070 somebody should be able to point out the device(s) it requires passed through that aren’t. :slight_smile:

edit:
I’m guessing the USB-C controller is on the card just doesn’t have a port hooked up to it. I think USB-C is likely built into nvidia’s chipset and is just up to the AIB partner whether or not to hook it up.

Alright. Kinda got it resolved but I don’t know the reason why.
I move the 2070 to the top PCI-E and the 1080 down. Everything is working fine now. Both cards’ fans are running normally, performance as expected.

Hopefully someone familiar with the issue can help to explain. And anyone else who encounters the same problem can try it out as a possible troubleshooting step as well.

1 Like