How to overclock Vega on Linux...?

Hay :slight_smile: I gave it a shot and the suggested grub line made my card pretty glitchy. Lots of flickering and flashing.
Asus strix rx480 8G OC card.
Suggested grub parameter
amdgpu.ppfeaturemask=0xfffd7fff

Only other setting I have is amdgpu.dc=0 because kernel 4.18 had no display .

Do you use dual monitors by any chance? If so, do you still have this using a single monitor?

Sure thing actually if you need any help with testing I’ll be happy to contribute.

Yes dual monitors… One HDMI one Displayport.

With one moniter the display is ok and the Wattman apt runs but the overclocks are not applied. When I drop out of a benchmark the peaks are still the default numbers.

so Kernel 4.20 is released, now I just need to figure out how the command line to increase Vega’s power target, anyone have any Idea’s?

2 Likes

You can look at the Arch Wiki:

https://wiki.archlinux.org/index.php/AMDGPU

Specifically:

To set the allowed maximum power consumption of the GPU to e.g. 50 Watts, run

echo 50000000 > /sys/class/drm/card0/device/hwmon/hwmon0/power1_cap

1 Like

thanks for the reply, I get “bash: /sys/class/drm/card0/device/hwmon/hwmon0/power1_cap: No such file or directory” because indeed there is no hwmon folder in device. I’ll research more
I would expect we just echo a value to one of the pp_od_?_? files under /sys/class/drm/card0/device/
Already had lm-sensors installed so I ran “sudo sensors-detect” and there is now a hwmon folder under device. from root I ran “echo 315000000 > /sys/class/drm/card0/device/hwmon/hwmon2/power1_cap” and indeed my Vega 64 is pulling more than 210 watts… Finally.
I choose 315000000 (315 watts) because the max value in the /sys/class/drm/card0/device/hwmon/hwmon2/power1_cap_max shows a value of 315000000.
315watts is better but still not enough. In windows card can use up to 240watts and with power target of +50% it’s allowed to pull in 360watts, oh well beggers can’t be choosers. thank you for all the help.

3 Likes

Glad you got it working and posted your solution!

I’m writing this to share how I (linux noob) increased my Sapphire Vega 64 Nitro+ power target to 50% above the cards default (240watt) on Ubuntu 18.10 using Kernel 4.20 (I’ve read that 18.04 is not possible but I’m not sure)
I’m using AMDGPU with mesa version 18.3 (I’ve read pre 18.3 will not work but I’m not sure)
if your GPU has the dual bios switch you will get 50% more of what ever that bios is programmed to, I got 315 watts on the one bios and 360 watts on the other bios.

  • in /etc/default/grub modify the line GRUB_CMDLINE_LINUX_DEFAULT=“quiet splash” so it shows
    GRUB_CMDLINE_LINUX_DEFAULT=“quiet splash amdgpu.ppfeaturemask=0xfffd7fff”
    If you have Radeon drivers installed you’ll need to specify to use amdgpu and you line will be
    GRUB_CMDLINE_LINUX_DEFAULT=“quiet splash radeon.si_support=0 amdgpu.si_support=1 amdgpu.ppfeaturemask=0xfffd7fff”
    -reboot
  • install lm-sensors(prerequisites are listed in the readme file) and run “sudo sensors-detect” as root
    -reboot
    when you navigate to /sys/class/drm/card0/device you will see a new folder “hwmon”
    in “/sys/class/drm/card0/device/hwmon/hwmon2/power1_cap_max” lists the maximum wattage according to your GPU/bios
    in “/sys/class/drm/card0/device/hwmon/hwmon2/power1_cap” shows you the wattage value set in your power play tables
    -in terminal “sudo su” then I run “echo 360000000 > /sys/class/drm/card0/device/hwmon/hwmon2/power1_cap” (without the quotation marks) which sets my maximum gpu power consumption to 360 watts.
    check in “/sys/class/drm/card0/device/hwmon/hwmon2/power1_cap” and see if it applied your settings.
  • using “watch cat /sys/kernel/debug/dri/0/amdgpu_pm_info” from root terminal will show gpu info.

I’m still having issues getting the GPU core and HBM memory overclock to work but last night I was actually successful and got a 20.7% FPS increase in the Strange Brigade benchmark.

for GPU HBM memory overclock I use echo “m 3 1105 1100” > /sys/class/drm/card0/device/pp_od_clk_voltage

I found that when I specify an overclock value only for state 7 the gpu will run in state 6, and when I specify a value for gpu state 6 the gpu will just operate in state 5 so now I just write my overclock to all three states and it works.
for GPU core pstate 5 overclock I use echo “s 5 1765 1200” > /sys/class/drm/card0/device/pp_od_clk_voltage
for GPU core pstate 6 overclock I use echo “s 6 1765 1200” > /sys/class/drm/card0/device/pp_od_clk_voltage
for GPU core pstate 7 overclock I use echo “s 7 1765 1200” > /sys/class/drm/card0/device/pp_od_clk_voltage

  • to apply overclock settings
    echo “c” > /sys/class/drm/card0/device/pp_od_clk_voltage
  • if overclock values are not applying try
    echo “manual” > /sys/class/drm/card0/device/power_dpm_force_performance_level

currently I’m using the amd.ppfeaturemask=0xffffff so my grub default line is
GRUB_CMDLINE_LINUX_DEFAULT=“quiet splash radeon.si_support=0 amdgpu.si_support=1 amdgpu.ppfeaturemask=0xffffff”

hope this helps someone.
I found much info from


https://wiki.archlinux.org/index.php/AMDGPU

1 Like

Well with the 4.20 kernel on Arch I can bring up the power target from 145W to 217W which is more than enough for the overclock I’m doing.

I definitely didn’t win the silicon lottery as I can only get 1440mhz core and 2250 memory with 1.15V. If I add more voltage the card heats up above 75 in long runs and gets unstable as a result. I can push the fans up to 3700rpm from 3000rpm for a little bit more overclock room but not worth the noise.

Side note… who can even take that kernel release seriously… I mean might as well name it the green kernel…

Sorry I have nothing constructive to add I’ll shush :joy:

I guess I can give my two cents.

On my Vega 64 raising the power limit, setting the fan speed and changing memory clock works. What doesn’t are core clock and voltage pstates. It disregards everything I write and is thus making overclocking/undervolting useless.

Under windows I had core ~1500mhz actual @ 990mV with memory being 1080mhz @980mV. That made the power draw fell down to ~180W. Now it’s going up to 280W and doing only 1600mhz.

Fedora 29 4.20.3-200.fc29.x86_64.
I guess flashing the bios is an option, but I never did that with such expensive hardware.

With a Vega 64, flashing the BIOS may not be a viable option. You can flash a Vega 56 BIOS with a cryptographically signed Vega 64 BIOS to get higher clocks, watts and voltage. But AFAIK no one has successfully modded a Vega 64 BIOS and booted with it. If someone has, please correct me.

However, all Vegas have dual BIOSes, which greatly reduces the risk of flashing the BIOS. If your flashed BIOS doesn’t work, just flip the switch and revert to the factory BIOS.

1 Like

I ran in to the same issues when testing out OC on my Vega 56. WattmanGTK reads the changes fine. But the core clock and voltage don’t actually appear to change. Memory clock and power Target are working as expected

1 Like

This is what I use for my radeon vii after the aforementioned kernel parameter has been added.

#!/bin/bash
fan=ls /sys/class/drm/card0/device/hwmon/

function set_gpu_fan_speed(){
#cat /sys/class/drm/card0/device/hwmon/hwmon4/pwm1_enable
echo “1” > /sys/class/drm/card0/device/hwmon/$fan/pwm1_enable
#cat /sys/class/drm/card0/device/hwmon/hwmon4/pwm1
echo “255” > /sys/class/drm/card0/device/hwmon/$fan/pwm1
}

function set_gpu_mem_clk(){
echo “manual” > /sys/class/drm/card0/device/power_dpm_force_performance_level
echo “m 1 1200” > /sys/class/drm/card0/device/pp_od_clk_voltage
}

function set_gpu_clocks(){
echo 300000000 > /sys/class/drm/card0/device/hwmon/$fan/power1_cap
echo “vc 2 1925 1125” > /sys/class/drm/card0/device/pp_od_clk_voltage
echo “s 1 1800” > /sys/class/drm/card0/device/pp_od_clk_voltage
}

function commit_gpu_config(){
echo “c” > /sys/class/drm/card0/device/pp_od_clk_voltage &&
echo “2” > /sys/class/drm/card0/device/pp_dpm_mclk &&
echo “6 7” > /sys/class/drm/card0/device/pp_dpm_sclk
}

function monitor_gpu(){
watch -n 0.5 cat /sys/kernel/debug/dri/0/amdgpu_pm_info
}

function main(){
set_gpu_fan_speed
set_gpu_mem_clk
set_gpu_clocks
commit_gpu_config
monitor_gpu
}

main
strong text

BTW, it must be run as the root user.

Perhaps it can be adjusted to work for you.

1 Like

I have an issue with AMDGPU on kernel 5.0.3 after applying below custom table

#!/bin/bash
echo 300000000 > /sys/class/drm/card0/device/hwmon/hwmon0/power1_cap

echo “manual” > /sys/class/drm/card0/device/power_dpm_force_performance_level

echo “s 1 991 850” > /sys/class/drm/card0/device/pp_od_clk_voltage
echo “s 2 1084 900” > /sys/class/drm/card0/device/pp_od_clk_voltage
echo “s 3 1138 950” > /sys/class/drm/card0/device/pp_od_clk_voltage
echo “s 4 1200 975” > /sys/class/drm/card0/device/pp_od_clk_voltage
echo “s 5 1401 990” > /sys/class/drm/card0/device/pp_od_clk_voltage
echo “s 6 1536 1050” > /sys/class/drm/card0/device/pp_od_clk_voltage
echo “s 7 1630 1080” > /sys/class/drm/card0/device/pp_od_clk_voltage
echo “m 3 1100 1000” > /sys/class/drm/card0/device/pp_od_clk_voltage
#fan at 100%
echo “1” > /sys/class/drm/card0/device/hwmon/hwmon0/pwm1_enable
echo “255” > /sys/class/drm/card0/device/hwmon/hwmon0/pwm1
#commit and apply max states
echo “c” > /sys/class/drm/card0/device/pp_od_clk_voltage
echo “3” > /sys/class/drm/card0/device/pp_dpm_mclk
echo “7” > /sys/class/drm/card0/device/pp_dpm_sclk

Voltage stucks at 1.2V and card heavily thermal throttles. Such table works without issues on other OS. I read thru kernel documentation at https://www.kernel.org/doc/html/latest/gpu/amdgpu.html but did not find any clue why voltages don’t apply.

Perhaps similar to my experience here: Using Vega with GNU/linux

Hi all, I’ve been struggling with OC/UV on my Vega64 and found out a few things, so I share it here maybe it will help a few people stuck with the same issues (from what I see). This bumps an old thread, sorry if this has been addressed already somewhere else.

I also started by the “echo s …” and “echo m …” and found out that some settings seem to be actually applied, but not the ones that matter to me. The power limit goes up and my card ends up chewing 300W, but then it means the UV settings are ignored, core voltage stays at the default 1200mV and my HBM2 is actually downclocked and stuck at 800MHz, which seems to be a very common issue. All in all not what I want to achieve: a mix of efficient OC/UV and <200W if possible.

So I started to look into the powerplay tables, and I found a way that seems to work, at least in my case. There is a certainly better / straightforward way of doing this, but at least that’s a beginning.

You need a tool that is able to produce a binary powerplay table. I do it this way: using OverdriveNTool in a windows VM to produce a registry file from a Vega 64 bios downloaded from techpowerup. Then I use notepad to remove ALL characters other than the hex code. I then save this as an ANSI text file that I feed into the java tool taken from there (you need java 10 sdk):

https://github.com/xmrminer01102018/VegaToolsNConfigs/tree/master/config/PPTDIR

This produces the powerplay table binary file. If the size of this file is anything else than 694 bytes, or if the java tool complains, you’re doing something wrong.

(edit nov 2019: if you download the .jar file by cliking on it, it seems to be corrupt ATM, or there is something I’m missing with GitHub. What you need to do is clone the git repository, and then go to the directory containing the .jar)

I then feed this binary to the AMDGPU driver with a
cat binary_ppt > /sys/class/drm/card0/device/pp_table
(adjust as needed if your card has a different ID)

Then I noticed that my HBM2 is still stuck at 800MHz. But for whatever reason, if I do a

echo “m 3 1050 900” > /sys/class/drm/card0/device/pp_od_clk_voltage

on top of having applied the PPT, this gets me where I wanted: all “s” and “m” states are exactly as I wanted them:

cat /sys/class/drm/card0/device/pp_od_clk_voltage
OD_SCLK:
0: 852Mhz 800mV
1: 991Mhz 810mV
2: 1084Mhz 820mV
3: 1138Mhz 840mV
4: 1200Mhz 870mV
5: 1401Mhz 900mV
6: 1536Mhz 930mV
7: 1630Mhz 960mV
OD_MCLK:
0: 167Mhz 800mV
1: 500Mhz 800mV
2: 800Mhz 820mV
3: 1050Mhz 900mV
OD_RANGE:
SCLK: 852MHz 2400MHz
MCLK: 167MHz 1500MHz
VDDC: 800mV 1200mV

This passes a loop of Unigine Superposition, the card draws 190W, GPU frequency is at 1488MHz stable (I’m on water), and HBM2 is at 1050MHz. Yay.
FYI I use kernel 5.1.9 ATM, and amdgpu.ppfeaturemask=0xffffffff in the kernel boot string, nothing else.

Hope it helps, and if you have a more straightforward way to produce PPT tables on linux let me know. Note I don’t claim my PPT table is stable and most optimized, what matters is to have the OC/UV settings actually applied to the GPU…

2 Likes

Hi @hagar-dunor ,

Can you share the powerplay table binary so I can test? Thanks.

Sorry I did not check this for a long time. The attached powerplay binary has the settings shown in my post above 960-1050.ppt (694 Bytes) (note: vega64 only, it has 1050MHz for the HBM2 that will be unstable for Vega56 with Hynix HBM2)

Note that the “manual” step on top of the ppt file
echo “m 3 1050 900” > /sys/class/drm/card0/device/pp_od_clk_voltage

Is not needed anymore as of kernel 5.3.9, it was a driver bug, see https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.3.9