How to overclock Vega on Linux...?

I ran in to the same issues when testing out OC on my Vega 56. WattmanGTK reads the changes fine. But the core clock and voltage don’t actually appear to change. Memory clock and power Target are working as expected

1 Like

This is what I use for my radeon vii after the aforementioned kernel parameter has been added.

#!/bin/bash
fan=ls /sys/class/drm/card0/device/hwmon/

function set_gpu_fan_speed(){
#cat /sys/class/drm/card0/device/hwmon/hwmon4/pwm1_enable
echo “1” > /sys/class/drm/card0/device/hwmon/$fan/pwm1_enable
#cat /sys/class/drm/card0/device/hwmon/hwmon4/pwm1
echo “255” > /sys/class/drm/card0/device/hwmon/$fan/pwm1
}

function set_gpu_mem_clk(){
echo “manual” > /sys/class/drm/card0/device/power_dpm_force_performance_level
echo “m 1 1200” > /sys/class/drm/card0/device/pp_od_clk_voltage
}

function set_gpu_clocks(){
echo 300000000 > /sys/class/drm/card0/device/hwmon/$fan/power1_cap
echo “vc 2 1925 1125” > /sys/class/drm/card0/device/pp_od_clk_voltage
echo “s 1 1800” > /sys/class/drm/card0/device/pp_od_clk_voltage
}

function commit_gpu_config(){
echo “c” > /sys/class/drm/card0/device/pp_od_clk_voltage &&
echo “2” > /sys/class/drm/card0/device/pp_dpm_mclk &&
echo “6 7” > /sys/class/drm/card0/device/pp_dpm_sclk
}

function monitor_gpu(){
watch -n 0.5 cat /sys/kernel/debug/dri/0/amdgpu_pm_info
}

function main(){
set_gpu_fan_speed
set_gpu_mem_clk
set_gpu_clocks
commit_gpu_config
monitor_gpu
}

main
strong text

BTW, it must be run as the root user.

Perhaps it can be adjusted to work for you.

1 Like

I have an issue with AMDGPU on kernel 5.0.3 after applying below custom table

#!/bin/bash
echo 300000000 > /sys/class/drm/card0/device/hwmon/hwmon0/power1_cap

echo “manual” > /sys/class/drm/card0/device/power_dpm_force_performance_level

echo “s 1 991 850” > /sys/class/drm/card0/device/pp_od_clk_voltage
echo “s 2 1084 900” > /sys/class/drm/card0/device/pp_od_clk_voltage
echo “s 3 1138 950” > /sys/class/drm/card0/device/pp_od_clk_voltage
echo “s 4 1200 975” > /sys/class/drm/card0/device/pp_od_clk_voltage
echo “s 5 1401 990” > /sys/class/drm/card0/device/pp_od_clk_voltage
echo “s 6 1536 1050” > /sys/class/drm/card0/device/pp_od_clk_voltage
echo “s 7 1630 1080” > /sys/class/drm/card0/device/pp_od_clk_voltage
echo “m 3 1100 1000” > /sys/class/drm/card0/device/pp_od_clk_voltage
#fan at 100%
echo “1” > /sys/class/drm/card0/device/hwmon/hwmon0/pwm1_enable
echo “255” > /sys/class/drm/card0/device/hwmon/hwmon0/pwm1
#commit and apply max states
echo “c” > /sys/class/drm/card0/device/pp_od_clk_voltage
echo “3” > /sys/class/drm/card0/device/pp_dpm_mclk
echo “7” > /sys/class/drm/card0/device/pp_dpm_sclk

Voltage stucks at 1.2V and card heavily thermal throttles. Such table works without issues on other OS. I read thru kernel documentation at https://www.kernel.org/doc/html/latest/gpu/amdgpu.html but did not find any clue why voltages don’t apply.

Perhaps similar to my experience here: Using Vega with GNU/linux

Hi all, I’ve been struggling with OC/UV on my Vega64 and found out a few things, so I share it here maybe it will help a few people stuck with the same issues (from what I see). This bumps an old thread, sorry if this has been addressed already somewhere else.

I also started by the “echo s …” and “echo m …” and found out that some settings seem to be actually applied, but not the ones that matter to me. The power limit goes up and my card ends up chewing 300W, but then it means the UV settings are ignored, core voltage stays at the default 1200mV and my HBM2 is actually downclocked and stuck at 800MHz, which seems to be a very common issue. All in all not what I want to achieve: a mix of efficient OC/UV and <200W if possible.

So I started to look into the powerplay tables, and I found a way that seems to work, at least in my case. There is a certainly better / straightforward way of doing this, but at least that’s a beginning.

You need a tool that is able to produce a binary powerplay table. I do it this way: using OverdriveNTool in a windows VM to produce a registry file from a Vega 64 bios downloaded from techpowerup. Then I use notepad to remove ALL characters other than the hex code. I then save this as an ANSI text file that I feed into the java tool taken from there (you need java 10 sdk):

https://github.com/xmrminer01102018/VegaToolsNConfigs/tree/master/config/PPTDIR

This produces the powerplay table binary file. If the size of this file is anything else than 694 bytes, or if the java tool complains, you’re doing something wrong.

(edit nov 2019: if you download the .jar file by cliking on it, it seems to be corrupt ATM, or there is something I’m missing with GitHub. What you need to do is clone the git repository, and then go to the directory containing the .jar)

I then feed this binary to the AMDGPU driver with a
cat binary_ppt > /sys/class/drm/card0/device/pp_table
(adjust as needed if your card has a different ID)

Then I noticed that my HBM2 is still stuck at 800MHz. But for whatever reason, if I do a

echo “m 3 1050 900” > /sys/class/drm/card0/device/pp_od_clk_voltage

on top of having applied the PPT, this gets me where I wanted: all “s” and “m” states are exactly as I wanted them:

cat /sys/class/drm/card0/device/pp_od_clk_voltage
OD_SCLK:
0: 852Mhz 800mV
1: 991Mhz 810mV
2: 1084Mhz 820mV
3: 1138Mhz 840mV
4: 1200Mhz 870mV
5: 1401Mhz 900mV
6: 1536Mhz 930mV
7: 1630Mhz 960mV
OD_MCLK:
0: 167Mhz 800mV
1: 500Mhz 800mV
2: 800Mhz 820mV
3: 1050Mhz 900mV
OD_RANGE:
SCLK: 852MHz 2400MHz
MCLK: 167MHz 1500MHz
VDDC: 800mV 1200mV

This passes a loop of Unigine Superposition, the card draws 190W, GPU frequency is at 1488MHz stable (I’m on water), and HBM2 is at 1050MHz. Yay.
FYI I use kernel 5.1.9 ATM, and amdgpu.ppfeaturemask=0xffffffff in the kernel boot string, nothing else.

Hope it helps, and if you have a more straightforward way to produce PPT tables on linux let me know. Note I don’t claim my PPT table is stable and most optimized, what matters is to have the OC/UV settings actually applied to the GPU…

2 Likes

Hi @hagar-dunor ,

Can you share the powerplay table binary so I can test? Thanks.

Sorry I did not check this for a long time. The attached powerplay binary has the settings shown in my post above 960-1050.ppt (694 Bytes) (note: vega64 only, it has 1050MHz for the HBM2 that will be unstable for Vega56 with Hynix HBM2)

Note that the “manual” step on top of the ppt file
echo “m 3 1050 900” > /sys/class/drm/card0/device/pp_od_clk_voltage

Is not needed anymore as of kernel 5.3.9, it was a driver bug, see https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.3.9