WattmanGTK AMDGPU overclock (Linux)

Hi everyone,

I am the creator of WattmanGTK on linux, and I am looking for testing with the software. The goal of the project is to offer a similar GUI as wattman on the windows side.

You can find the software here:

If you have any problems, please let me know!

Furthermore, suggestions are also welcomed.

10 Likes

I simply hate that fan control, there is my feedback :smiley:

Think that laptops would need RyzenMaster which would anyways be amazing to get to work in tandem with GPU profiles, gotten this kind of picture from my short laptop investigations

Then, these GPU states would need some sort of hide & lock toggles for it to be possible to test that they are stable

What kind of fan control would you suggest? That way I can reconsider it :wink:

Three points so that I can set that middle one to 70C and highest silent fan speed

0% = 0C
43% = 70C
100% = Max temp

This is sick. Too bad I sold my 580 3:

Impressive stuff.

Does it have a “Night Mode”?

Oh nice…

I can test this on a dual VEGA rig in the morning.

I’ll give it a go, primarily interested in undervolting controls and power limits (on vega). Can also test on rx480. Both reference.

On my dual monitor system 2x HDMI screens with RX580 - setting the ppfeaturemask to 0xfffd7fff causes a serious problem.

The GPU starts at it’s base clocks of 300Mhz Core and memory and it seems that there is something bad relating to the VSync / blanking interval in the AMDGPU code.
I get green artefacting horizontal lines flashing down both monitors AND these lines are in sync with each other across both screens.

I’ve tired to capture it - but it’s rather difficult.

I also have amdgpu.dc=1 amdgpu.dpm=1 set, these on their own without the featuremask cause no issues. I am running a 4.19.4 linux kernel.

Overall however there is an issue that has persisted for some time.
During idle the amdgpu code forces my RX580 into a higher power state than needed - that means the memory clock is pushed to 2000Mhz always, the VCore is pushed to 0.95V and the Core jumps all over the place - completely unnecessarily - additionally the PCI-e slot very rarely goes into power saving state and always runs at full speed. This is a problem that does not occur on windows.

Overall this leads to substantial power consumption increases. At idle I measure 180W via my UPS. Granted this is a very beefy workstation system with lots of disks and monitors also attached to the UPS. But by manually downclocking the GPU to it’s 300Mhz idle state (artifacting present) or unplugging the second screen, it idles at between 112W -128W - the same as Windows does.

ROCM output below

========================        ROCm System Management Interface        ========================
================================================================================================
GPU   Temp   AvgPwr   SCLK    MCLK    PCLK           Fan     Perf    PwrCap   SCLK OD   MCLK OD  GPU%
0     38.0c  36.122W  900Mhz  2000Mhz 8.0GT/s, x16   31.76%  auto    145.0W   0%        0%       0%       
1     30.0c  28.102W  300Mhz  300Mhz  2.5GT/s, x8    31.76%  auto    145.0W   0%        0%       0%       
================================================================================================
========================               End of ROCm SMI Log              ========================

Any ideas what’s happening? Bad DC code with memory and vsync handling somewhere in amdgpu codebase?

This looks like I need to make a kernel bug report.

And I know it worked in the past - since I’ve previously made my own overclocking bash script.

Do you still have the same problems when using a single screen? I also have some weird artifacts going on when using dual screens. So I believe this is a kernel driver bug.

The issue is not present with a single monitor attached.

Apparently it looks like this is something that was patched on the windows driver 2 years ago

And looking at various other sources, this multi monitor power management problem seems like a long standing issue that is well overdue for a fix.

https://community.amd.com/message/2793350#comment-2793350

On older cards (HD4000-6000) I know there was a hardware DAC reason for this - but on newer cards it’s demonstrably no longer the case with Windows.

1 Like

Do you know there is a bug reported on this on bugs.freedesktop.org? Otherwise we should file one

These two seem related

https://bugs.freedesktop.org/show_bug.cgi?id=108647
https://bugs.freedesktop.org/show_bug.cgi?id=102646

But nobody seems to be directly addressing the power management problem.

Seeing that OC is not yet implemented for WattmanGTK here’s how I did my overclocking via a basic bash script and a systemd service

/etc/systemd/system/amdgpu-overclock.service

[Unit]
Description=AMDGPU Overclock
DefaultDependencies=no
After=sysinit.target local-fs.target
Before=basic.target

[Service]
Type=oneshot
ExecStart=/usr/local/bin/oc_rx580

[Install]
WantedBy=basic.target

/usr/local/bin/oc_rx580

#!/bin/sh

## Card 0
if [ -f "/sys/class/drm/card0/device/pp_od_clk_voltage" ] ; then
    echo "Overclocking card 0"
    # Set GPU Core clock table
    echo "s 0 300 750" > /sys/class/drm/card0/device/pp_od_clk_voltage
    echo "s 1 625 769" > /sys/class/drm/card0/device/pp_od_clk_voltage
    echo "s 2 935 887" > /sys/class/drm/card0/device/pp_od_clk_voltage
    echo "s 3 1190 1100" > /sys/class/drm/card0/device/pp_od_clk_voltage
    echo "s 4 1265 1181" > /sys/class/drm/card0/device/pp_od_clk_voltage
    echo "s 5 1305 1150" > /sys/class/drm/card0/device/pp_od_clk_voltage
    echo "s 6 1350 1150" > /sys/class/drm/card0/device/pp_od_clk_voltage
    echo "s 7 1400 1150" > /sys/class/drm/card0/device/pp_od_clk_voltage

    # Set Memory Clock table
    echo "m 0 300 750" > /sys/class/drm/card0/device/pp_od_clk_voltage
    echo "m 1 1000 850" > /sys/class/drm/card0/device/pp_od_clk_voltage
    echo "m 2 2050 950" > /sys/class/drm/card0/device/pp_od_clk_voltage

    # Commit (apply) clock table
    echo "c" > /sys/class/drm/card0/device/pp_od_clk_voltage
    echo "Applied OC Clocks"

    cat /sys/class/drm/card0/device/pp_od_clk_voltage
fi

Now however - this is a quick and dirty way of doing it for just one specific card.

Regarding the stats your are pulling for GPU monitoring - it seems you’re parsing sysfs - are you considering using ioctl in future to do it ain a more robust manner?

The radeon-profile project has got a nice example of all the amdgpu ioctl calls already

1 Like

Yes, I am planning to use IOCTL’s in the future. :wink: First I want to implement applying values from the GUI (I have a dev version that works already with this). And I want to package it up into a flatpak.

I am aware of how radeon-profile works and will probably use a similar way.

Also, please notice that your overclock is depending on symlinks that could change. For a more robust way, I would not to use symlinks. (See also, https://wiki.archlinux.org/index.php/AMDGPU#Overclocking)

I have filed a new bug at https://bugs.freedesktop.org/show_bug.cgi?id=108941. Feel free to chime in.

1 Like

I know it’s a nasty way
It barely does the minimum of path checking :smiley:

I had been wanting to do a project just like this very soon now that I have free time to work on it. I’d love to contribute to your project. Where did you find documentation on all of the parts that need to be addressed. That was the hardest problem I ran in to when researching.

Hey, if you have artifacting on dual monitor the “fix” is to go single monitor, install Radeon Profile and set low powerstate. I have not had to start it on every start, although I did install the daemon, so its probably running the background.


I installed it from the AUR

I was able to keep dual monitor by setting powerstate to high instead