After a slew of updates, I’ve just noticed that the fans on my ageing Vega64 have decided to go on strike. I only noticed when I decided to load up steam and plough through a few titles. In short, everything appears to be working well apart from the fans, so any reasonable load results in the card thermal throttling…
So far, I’ve played with a few kernel parameters, played with sysfs and forcing 50%/100% fixed fan speeds to no avail…
Arch, 5.11 kernel, mesa-git, etc…
My Google-Fu has got to the point of no longer pulling up fresh avenues to look into and now has me running in circles,.
Usually yeah, I’ve never had a fan just stop all of a sudden, but I’d still check it just in case.
I’d check if the fan spins freely (which it likely does), and then I’d though it in another machine if you have one.
Well the fans (all three) do indeed spin freely. As for swapping to an alternative tower, unfortunately most of my HW is rack mounted without enough space to shoehorn the card.
For the time being I’m primarily wanting to focus on the SW side, as cobbling together a fresh tower from parts in my spare junk bin (all the past builds that I’ve kept hold of over the years) doesn’t sound like a weekend of entertainment.
I will try and see if I can get the fans spinning up on a slightly older manjaro build and then play spot the difference.
If you think it’s an update issue I would just pull a Live-ISO and boot that, check if the fan works there since Live-ISOs don’t come with updated-everything.
Side note: Vega 64 on 5.10.14 here and no issues as of yet.
I’m in the process of packaging up an AUR / debian package for an amdgpu fan controller I’ve been working on and use myself on my Vega64 and 6800xt.
I’ve got it published to the test pypi instance as I’m just working out the packaging stuff now. So you could pip install it from there. Or snag it from my github for amdfan if your comfortable building your own arch packages etc.
May help you troubleshoot to make sure your actually setting the sysfs settings properly.
Also on Arch, 5.10.16 no issues with the fan control there either.
I will be keeping an eye out for when your package hits the AUR. I pulled it down from github and went through the motions to get it running, with no issues. However, still no dice…
I have been able to confirm that the fans do still work on kernel 5.9.x (manjaro) and FreeBSD. So that rules out the possible hardware failure.
After a little more digging about, it appears that my issue could be linked to a regression within the kernel. This commit looks to be the cause…
When I was using a Vega 56 in my old Linux rig, I used a tool called CoreCtrl to edit the fan curve. It’s also useful for raising the power limit, overclocking, etc. It does require that you add amdgpu.ppfeaturemask=0xffffffff to your kernel options, but is otherwise painless to use. Setting that kernel option might also get rid of the permission errors you’ve been seeing in sysfs.
Pull latest kernel source and build against last known good config (so now running 5.12-rc1)
Moved to using modprobe.d/amdgpu.conf (as having “War and Peace” inside GRUB “CMDLINE_LINUX_DEFAULT” was getting really annoying.
Poked about inside /sys/class/drm/card0/device/hwmon/hwmon2/ { pwm1_enable, pwm1, fan1_enable, fan1_target, etc…} Managed to get a flutter from the fans.
Ran “amdfan --manual” (finally manual control is up) Set to auto and all seems well now.
Now it’s sorted, I’ll be running away from poking sysfs for a while, as apparently with my config "Here be schizophrenic dragons!"