AMDGPU Fan Control

Up through kernel 5.10, my AMD RX550 GPU fan was happily spinning at low speed keeping my GPU cool under idle - around 35-39C. In 5.11 the kernel developers broke something. The fan doesn’t spin until GPU temperatures reach about 65C. I saw some discussion about this problem existing in 5.10, but apparently it was fixed. Now it’s back. Same in 5.12rc.

So, I found the service “amdgpu-fancontrol”, which lets me set a comfortable fan curve. Here’s a relevant section of code:

hwmon paths, hardcoded for one amdgpu card, adjust as needed

FILE_PWM=$(echo /sys/class/drm/card0/device/hwmon/hwmon?/pwm1)
FILE_FANMODE=$(echo /sys/class/drm/card0/device/hwmon/hwmon?/pwm1_enable)
FILE_TEMP=$(echo /sys/class/drm/card0/device/hwmon/hwmon?/temp1_input)

Here’s the service:

[Unit]
Description=amdgpu-fancontrol

[Service]
Type=simple
ExecStart=/usr/bin/amdgpu-fancontrol

[Install]
WantedBy=multi-user.target

The problem is that the service keeps failing at boot with the message “invalid hwmon”. If I start the service manually after I login, it works fine. So the service is being called at a time before the hwmon device is created. The last digit of the hwmon device can change between boots or with kernel changes. I have tried to find the service that creates hwmon, and it looked to be “sensord.service”. (I don’t recall which command I used to determine this.) I added:

After=sensord.service

under [Unit], and this worked for a while, then stopped. (I think I updated the kernel, but I don’t remember.) I made sure the service was enabled, and it was, but it keeps failing at boot. I can’t find where in the boot sequence to invoke the service to get it to start successfully. This is so annoying I just reverted to 5.10 just to have my fan control back. But I’m running a Ryzen 9 5950X so I’d like to be on the latest stable kernel.

I’m at my wit’s end. The whole point of Linux is to give us back control of our computers, and then someone makes this decision for us. Ack! (Yes, I know I can start the service manually after each login, but that’s not the proper way to solve the problem.) Can someone give some advice?

Hi,

You may have to load the amdgpu kernel module earlier, or startup the application later.

I had someone report a similar issue for my own fan control app no compatible cards found, exiting amdfan.py:199 · Issue #6 · mcgillij/amdfan · GitHub

I don’t know what distro your using, Kernel mode setting - ArchWiki here’s one potential way to load your module earlier.

Depending on which bootloader/settings can change the times that your cards driver are loaded.

Welcome to the world of systemd :roll_eyes:

I can, but you may not like it. Ditch systemd, use OpenRC instead.

HTH!

Thanks for the reply. I’m running Manjaro, and unfortunately adding the module didn’t work. I looking through the systemd man pages right now.

Roll back to the kernel that worked? I mean you have full control of your computer to do that, and full freedom of choice of which kernel to run.

Freedom doesn’t automagically mean “everything will be bug free if I upgrade to new kernels immediately”. There’s a reason RHEL, etc. pick a stable kernel and back port to it.

I put a debug statement in the amdgpu-fancontrol script, and I found that at the time the script is invoked, every hwmon device exists except the one for amdgpu.

Here’s the fix to the script /usr/bin/amdgpu-fancontrol:

while [[ ! -d $(echo /sys/bus/pci/devices/0000:2d:00.0/hwmon/hwmon?) ]]
do
echo “waiting for DEVICE_ROOT”
sleep 1
done
DEVICE_ROOT=$(echo /sys/bus/pci/devices/0000:2d:00.0/hwmon/hwmon?)
echo “Got $DEVICE_ROOT”