Greetings Humans!
First, the question(s):
Has anyone noticed AMD clock behavior has changed in a recent(*) BIOS update?
Second, the story:
The hardware I’m working with: Ryzen 5800X / ASUS Prime x570-pro / 64Gb (4 sticks) GSkill 3200Mhz / EVGA 3060 / Samsung SSD / 360mm AIO / Fractal Meshify / Debian Linux 11
Because I have an addiction to graphs, the first thing I do with any system is setup graphs. I use prometheus node_exporter on my machine, and a second machine collects the data and puts it into grafana for me. I only keep 30 days, because it’s just my home machines (my 30 days has rolled over so I can’t get old data out - yes i regret this now)
So, here’s the graph that started my problems: (8 days of data)
the min and max are simply the smallest and biggest numbers for a specific polling period, and the average is the average for all cores in the same polling period.
the numbers we’re looking at specifically are the CPU Min-Min numbers, which jumped from 1.8Ghz on the low end to 2.2 Ghz, and the Average for all cores dropped from in the ballpark of 2.5Ghz to around 3.5Ghz. The max-max is nice, but it’s not what I’m here for.
The left side of the graph is on the BIOS that shipped with the board, which I think was 2804 - I didn’t expect trouble, so didn’t really take note. The BIOS update was bought about from half of my RAM disappearing one day watching a youtube video. Machine hard locked and wouldn’t boot with the “B” channels populated. It had been running since new on this configuration. (6-9 months? pandemic timelines are strange)
I ran on 2 sticks for a while because nobody really needs 64Gb of RAM for daily driving. Eventually I had some free time and decided to work out which RAM had died so I could RMA it. I started testing and found that all of it worked fine. While poking around the internet, I found there were quite a few BIOS updates for my board, and a number of them said “RAM compatibility improvements” so I thought “that’s me!” and installed the latest one (4204) - The machine is currently running on all sticks of RAM again. Uptime is currently 18 days, so “pretty stable”.
BUT. The Clocking behavior has changed since the BIOS update. Instead of defaulting to “be clocked down, clock up when you need it” it seems to have changed to “be mildly clocked at all times” - so much so that it does not seem to be able too go below 2.2Ghz anymore, and the average is closer to the max clock than it used to be. The work load hasn’t changed, and the software hasn’t either. (and definitely didn’t change on the reboot for the new BIOS as picture above)
I’ve installed all of the BIOSes the board would let me (I don’t seem to be able to go older than the 3xxx version BIOSes) and fiddled with (nearly) every setting and reset the BIOS multiple times, and nothing makes it be like it used to be.
Why am I chasing low clocks? Because power isn’t free, and I don’t need 8 cores running at 3.5Ghz to play music in spotify! I don’t turn my machine off, it goes to sleep after a while of inactivity. I don’t have speed issues when it was needed, it would happily clock up and stay there running games or compiling or whatever I’m doing at the time. A single-threadded workload would only pull up one core, and leave the rest where they were. Now it seems to be pulling idle cores up as well.
So has anyone else seen this? Is anyone running on an old BIOS and prepared to setup node_exporter and push their numbers into grafana, and share them here?
I’m also fairly deep down the “it’s the BIOS” rabbit-hole, if anyone has any other things to check, that would also be appreciated.