RX Vega 56 performance decreases significantly when undervolting by 1mV in Linux (weird undervolting behavior in general)

I am currently trying to undervolt my RX Vega 56 Sapphire Pulse (with Vega 64 Bios https://www.techpowerup.com/vgabios/199111/199111) on Linux (4.19.8-2-MANJARO, amdgpu driver). I am using FurMark (GpuTest 0.7.0) in 1080p resolution to stress the GPU.

With default settings, I am getting around 210 FPS. However, if I undervolt by just 1mV (all other settings stock), the powerstate immediately drops down to 2 and the FPS to 170. Similar performance drops can be seen in games (Witcher 3) and machine learning (benchmark from rocm/tensorflow docker image).

With stock settings, the SCLK (read from /sys/kernel/debug/dri/0/amdgpu_pm_info) is around 1310 (same as in Windows), with undervolting it reports around 2000, which is clearly not possible. edit: This only happens if the undervolt is applied while FurMark is running. When restarting furmark, it displays 715 which is in line with the performance, but it should not be so low.

Script for undervolting (including the stock settings):

#!/usr/bin/env bash

DIR=/sys/class/drm/card0/device
PP=$DIR/pp_od_clk_voltage

echo "s 0 852 800" > $PP
echo "s 1 991 900" > $PP
echo "s 2 1084 950" > $PP
echo "s 3 1138 1000" > $PP
echo "s 4 1200 1050" > $PP
echo "s 5 1401 1100" > $PP
echo "s 6 1536 1150" > $PP
echo "s 7 1630 1199" > $PP

echo "m 0 167 800" > $PP
echo "m 1 500 800" > $PP
echo "m 2 800 950" > $PP
echo "m 3 945 1100" > $PP

echo c > $PP
cat $PP

Overvolting by 1mV makes no difference. Undervolting the memory by 1mV has the same effect.

Additionally, If I only overclock the memory to 1Ghz, the FPS are around 160 and if I also undervolt by 1mV it drops to 120. The state then goes down to 0.

These observations make me think that there might be sort of smart behavior going on for stock settings which becomes disabled when undervolting and then the 220W limit is hit; although the fact that overvolting is fine might contradict that hypothesis (a quick glance over the driver source did not find anything).

On Windows with “4 1200 1000; 5 1300 1000; 6 1400 1000; 7 1500 1020” , the GPU pulls around 210W, but with the same settings on Linux, the card hits 220W. Hopefully 4.20 becomes available soon such that I can raise the power limit.

Before someone says its due to the Bios flash, similar behaviour could be observed with stock Bios. Moreover, the issue is not present on Windows.

Update:
With ubuntu 18.10 and 4.20 (https://github.com/M-Bab/linux-kernel-amdgpu-binaries), the issue is still persists. Further, if I set the power cap to 300W and undervolt to the point where the card would draw 200W on Windows, it still draws 300 while performing badly (where is all that power going?). Tested with stock as well as 64 Bios. Time to raise a bug on the kernel I guess…

1 Like

Noted that SoC clock drops from 1200 down to 800mhz when downvolting core to be 999mv, then as its been just confusing to make any sense with any stress test, I havent bothered to express my confusion more than noting about it over lounge :man_shrugging:t2:

This seems similar to the behavior of my card. Did this happen with any undervolt or only to 999 (or was stock already 1000)?

I dont know what the stock is for 64, I just noted that when going under 1000mv to 999mv, performance sinks and that SoC clock seems to stay 800MHz

TBH, would love to find really aggressive Vega stress test because this guessing game is just too bothersome to make any sense about

Most likely due to HBM2 not being entirely stable under 1000mv and then GPU will clock down along with the HBM2 to allow it to be at the specified voltage

Perhaps give it a try with radeon-profile in the AUR

https://aur.archlinux.org/packages/radeon-profile-git/

How will that program help? It looks like it just displays some information about the GPU.

It functions as “TLP” with a gui for radeon cards. With performance mode, low battery, fan control and even overclocking if i remember correctly. It has to be opened with sudo to enable those options

Thanks, but it only seems to be able to set pstates and use overdrive, which does not help me.