Asrock X570d4u, x570d4u-2L2T discussion thread

I fundamentally disagree with your points of view.

Do you verify with neutral data from AMD directly that the BIOSes and Firmwares ASRock Rack publishes for your Epyc parts are actually the latest versions and not just the “latest” ASRock Rack releases?

These frequent AGESA updates had been released from AMD exactly because in the recent past multiple severe security issues had been discovered that needed to be addressed via AGESA updates.

Zen 2/Zen 3 platforms are well past the point of needing BIOS updates for relatively harmless stuff llke for example intermittent USB connectivity issues where you could think about skipping an update - don’t get me wrong, that’s a very good thing and makes it even worse when ASRock Rack just ignores AGESA updates from AMD.

At this point it would basically be “File > Import new AGESA version” and “File > Export changed project as new BIOS release”.

But that would seemingly be too good of a customer support for ASRock Rack’s bean counters.

Maybe someday we can get somewhat open-source BIOSes users themselves can maintain when the original manufacturer is no longer willing to.

No…?

From an engineering standpoint “Server” and “Consumer” are just empty market segmentation buzzwords.

I don’t want to be dependent on an operating system to load fixed microcode updates if there are proper BIOS-level updates that address these respective issues. Also it’s not like ASRock Rack’s BIOSes and Firmwares on their own (not looking at general AGESA-related issues affecting every manufacturer) are perfect and bug-free, they could severely use more frequent updates to actually mature their products.

Here’s an example of another user where their system couldn’t boot because of bugs that had been fixed a long time ago but ASRock Rack didn’t bother to publicly release a new BIOS and Firmware version.

I’m sure they’d be thrilled to hear that they can just fix these issues with a booted operating system:

This was one of the “other channels” I had mentioned that died down. Another were FTP guest login credentials for a download server hosted by ASRock Rack where you could at least get more recent BIOSes and firmware updates compared to their shitty public website. But this option seems to be also gone now.

I casually read other manufacturers Epyc motherboard forums and have not noticed anything that jumps out for me that different AGESA/BIOS levels are available from them compared to Asrock Rack. Seems everyone is on the same footing level since the updates come from AMD in the first place before each manufacturers BIOS maintainers do their own massaging of the the base code into their gui/packaging formats.

I have not been missing any feature or having any issues with my current BIOS levels on my boards. When I have voiced concerns over issues in the past back to when I first started using the Epyc platform, I got very good responses and corrective actions from [email protected]

He got me fixed inventories in the Redfish IPMI and provide new BMC firmware chips when I needed them for no cost.

The last time I accessed their FTP servers this was the login values:
Login FTP:

user: RackTSD

pass: 73$C$wkZ

Could give it a try with the URL for your X570d4u board

ftps://[email protected]/racktsd/X470D4U/BIOS/X470D4U4.20

and try shotgunning in some other possible new BIOS name variations.

No browser supports FTP protocol anymore so you will have to resort to other applications like the sftp client and such.

Meme_ItsdeadJim

+1 :+1:

1 Like

My teammate a month or so ago, got a new beta BIOS for the EPYCD8 out of William. Our experiences differ from yours apparently.

Well, the open source BIOS project is called Coreboot. And from the articles at Phoronix keeps adding more and more hardware to the supported platforms.

I expect that will eventually be mainstreamed if only for the fact I am sure the board manufacturers would LOVE to palm responsibility for supporting alternative OS’ onto the OS’s themselves.

Thanks!

I did get amd_pstate=passive working correctly with CPPC and CPPC Preferred Cores enabled in the BIOS. amd_pstate=active … kind of worked? It loaded the correct driver in Proxmox but the feedback I was getting out of it looked glitchy, like it couldn’t read some values from the CPU that it was able to read in passive mode.

Could you explain the difference between guided and passive? I understand that passive doesn’t do much beyond exposing additional pstates to the OS, but I’m still not one-hundred percent clear on what guided does or the best way to enable it.

This is also the first time I’ve heard of amd_prefcore=enable. Thanks for that. Proxmox just released 6.5.13, so it might be a while until we get to 6.9.0. :slight_smile:

I didn’t see a significant drop in idle power using the passive driver, but I didn’t try guided yet. It was pointed out to me elsewhere that that since I was testing at idle, the CPU might already be in its lowest possible power state. I’m going to have to wait until I’ve loaded the server up a bit with VM and LXC workloads to come back to this and do some more testing.

This is off topic and I should probably move it to another thread, but I’ll admit I’m not enamored with the amd_pstate driver in its current state. There’s a whole lot of “this worked for me, but I don’t know why” in the Proxmox forums, Reddit, and even here, and the documentation points at kernel dev docs that are really much lower level than the average sysadmin probably needs to be looking to set up something like this for Proxmox or Windows or any other consumer/server OS. There should be an additional level of abstraction for those of us that want to optimize power draw but aren’t kernel developers who understand the minutiae of how schedulers and governors and all the rest work.

That said, it makes more sense to me that this would be the case if complete feature support is just now making its way into the kernel in the 6.9 release. Maybe once it’s more settled, end-user facing documentation will improve.

I’ve never been able to get amd_pstate=active to work on my Zen 2 Epycs or even a new Zen 3 ES Epyc. When I do that, it actually defaults back to acpi-cpufreq driver. Only amd_pstate=passive will work.

But that is still on the older 6.5 kernels. On my Zen 3 Milan Epyc on the kernels 6.5 through 6.9 I have been able to run amd_pstate=active with no issues.

Been reading through the Phoronix articles and it seems that Zen 2 on the lesser kernels didn’t have amd_pstate recognize the family ID’s and that is fixed in the 6.9 kernels. That is why amd_pstate=active never works.

Phoronix did a test of passive, active and promised to do a test of guided but they never did those tests. There was just a single comment that on the Epyc cpus, guided would be preferred.

I tried guided on my Epyc Milan but couldn’t really detect any benefit or changes one way or the other compared to active on my normal Boinc loads.

The amd_pstate docs here explain the various permutations of the driver.
https://docs.kernel.org/admin-guide/pm/amd-pstate.html

1 Like

Thanks! I’ll check those docs out.

Did you notice appreciably lower power usage at idle? Or is the amd_pstate driver better at lowering power draw when the system is under load?

I didn’t notice a difference at all between any of the amd_pstate modes and the acpi driver at idle–the amd_pstate driver actually seemed to spike to higher power at idle more often.

I never run my systems at idle unless coming directly off reboot and I am messing around with configurations or other junk.

But the idle clocks of amd_pstate are much lower than the acpi-cpufreq driver. Idle cores run at 400 Mhz for the amd_pstate driver compared to 1200Mhz for the acpi-cpufreq driver.

Also a unloaded core transitions much quicker down to the idle state for the pstate driver compared to the cpufreq driver.

What you observed probably is true, the pstate driver under control of the cpu’s built-in power management algorithms can switch core power levels much faster than what the system control of the cpufreq driver can manage.

But I see lower overall system power consumption with the pstate driver compared to the cpufreq driver when the system is actually doing work at my full loads. Which is 24 hours a day. I think it is the faster power transitions over time that account for that.

Thanks. I just set it up, so I haven’t even set up any VM or LXC instances yet. So … I was trying to test all this before I did that, since having to reboot a running system with an active TrueNAS VM while I changed power driver settings seemed risk-prone.

But from our discussion, I’m not going to be able to meaningfully test the system unless it’s actually doing something.

How are you monitoring system power? I have a managed PDU that lets me see how much a device is drawing from the raw, so I’ve just been watching that.

In any case, I’m probably going to wait until Kernel 6.9 is the default for Proxmox. I’m not good enough at all this yet to run a kernel in Proxmox’s testing repo on a production system. :stuck_out_tongue:

EDIT: What sort of workloads are you doing? I still haven’t gotten a clear answer on whether there’s actually a stability hit on virtualization/Proxmox with the amd_pstate driver.

For the Zen 2 and Zen 3 Epycs and Ryzens, I use zenpower3 module and zenmonitor3 gui app on the desktop. I also can see the overall system power usage on the UPS screen with the monitor turned off.
But that also includes the multiple gpus in each host. So zenpower3 is what I use to determine just the cpu wattage consumed.

Zenpower3 hooks into the standard RAPL interfaces in /sys/devices/platform/amd_energy/

Since I run the Epycs flat out I also bump their cTDP and PPT settings to max package power in the UEFI. Basically running at 240W in the settings and can see as much as 272W under stress tests or around 248W under my normal BOINC loads. I also set Determinism to Power.

Workloads are a mix of cpu and gpu work for various Boinc projects. Everything from basic sse2 to avx2 for the Zen 2/3 hosts. The Zen 4 hosts also do avx512 work when profiling asteroid light curves. Gpus do CUDA apps.

1 Like

Thanks!

Since I’m on a Proxmox server, I really only have access to command line/text-based GUI monitoring tools, so I’ll need to do some more research.

My ePDU managed power strip thing lets me see how much total power the server draws from the wall in real time, so when I’m curious I just eyeball it when doing stuff. I’m sure I could set up something more automated, but so far this has worked fine for testing.

Well from the command line there is this quick and dirty Ryzen-Epyc RAPL monitor. I tried it out just a couple of days ago and compared it to what zenmonitor was reporting for power. Works really well.

root@Serenity:/home/keith/Downloads/Utils/rapl-read-ryzen-master# ./ryzen
0 (0), 1 (0), 2 (0), 3 (0), 4 (0), 5 (0), 6 (0), 7 (0)
8 (0), 9 (0), 10 (0), 11 (0), 12 (0), 13 (0), 14 (0), 15 (0)
16 (0), 17 (0), 18 (0), 19 (0), 20 (0), 21 (0), 22 (0), 23 (0)
24 (0), 25 (0), 26 (0), 27 (0), 28 (0), 29 (0), 30 (0), 31 (0)

Detected 32 cores in 1 packages

Core energy units: a1003
Time_unit:10, Energy_unit: 16, Power_unit: 3
Time_unit:0.000976562, Energy_unit: 1.52588e-05, Power_unit: 0.125
Core 0, energy used: 7.33093W, Package: 149.79W
Core 1, energy used: 5.49362W, Package: 149.79W
Core 2, energy used: 5.74692W, Package: 149.79W
Core 3, energy used: 7.25128W, Package: 149.79W
Core 4, energy used: 7.38541W, Package: 149.79W
Core 5, energy used: 7.20139W, Package: 149.79W
Core 6, energy used: 10.0557W, Package: 149.79W
Core 7, energy used: 7.09015W, Package: 149.79W
Core 8, energy used: 6.92093W, Package: 149.79W
Core 9, energy used: 6.74881W, Package: 149.79W
Core 10, energy used: 7.00989W, Package: 149.79W
Core 11, energy used: 7.30164W, Package: 149.79W
Core 12, energy used: 7.10083W, Package: 149.79W
Core 13, energy used: 6.62903W, Package: 149.79W
Core 14, energy used: 7.00546W, Package: 149.79W
Core 15, energy used: 7.01096W, Package: 149.79W
Core sum: 113.283W

Its here on Github. https://github.com/djselbeck/rapl-read-ryzen

1 Like

Oh. Excellent.

This is way easier to visualize than the IPMI sensor readings. :slight_smile:

X570D4U Bios 1.57 with date 4/11/2024 avilable as beta, anyone tried?

FYI, the x570d4u support has been merged into upstream OpenBMC:

It builds fine, but since I took that machine out of active use, I haven’t tested to see what works and what doesn’t work with it.

I tried it… the BMC won’t boot now, and in turn, the board won’t power on either. I probably flashed it wrong (but the build script sure didn’t tell me what file to flash!)

I need to grab my TTL serial adapter and see what’s going on.
ASRock Rack X570D4U mainboard: pinout | Nicolai Electronics

U-Boot 2019.04 (Mar 25 2024 - 04:43:19 +0000)

SOC : AST2500-A2
RST : WDT1
LPC Mode : SIO:Disable
Eth : MAC0: RGMII, , MAC1: RMII/NCSI,
Model: AST2500 EVB
DRAM:  496 MiB (capacity:512 MiB, VGA:16 MiB, ECC:off)
MMC:   sdhci_slot0@100: 0, sdhci_slot1@200: 1
Loading Environment from SPI Flash... SF: Detected mx66l51235l with page size 256 Bytes, erase size 64 KiB, total 64 MiB
OK
In:    serial@1e784000
Out:   serial@1e784000
Err:   serial@1e784000
Net:
Warning: ethernet@1e660000 (eth0) using random MAC address - 46:53:41:40:34:bf
eth0: ethernet@1e660000
Warning: ethernet@1e680000 (eth1) using random MAC address - c2:3b:24:eb:64:ca
, eth1: ethernet@1e680000
Hit any key to stop autoboot:  0
## Loading kernel from FIT Image at 20100000 ...
   Using 'conf-aspeed-bmc-asrock-x570d4u.dtb' configuration
   Trying 'kernel-1' kernel subimage
     Description:  Linux kernel
     Type:         Kernel Image
     Compression:  uncompressed
     Data Start:   0x20100104
     Data Size:    3270720 Bytes = 3.1 MiB
     Architecture: ARM
     OS:           Linux
     Load Address: 0x80001000
     Entry Point:  0x80001000
     Hash algo:    sha256
     Hash value:   unavailable
   Verifying Hash Integrity ... sha256 error!
Can't get hash value property for 'hash-1' hash node in 'kernel-1' image node
Bad Data Hash
ERROR: can't get kernel image!
1 Like

Thanks for the tip off on this. Nice to see we’re getting our yearly BIOS update soon. :slight_smile:

In case anyone’s curious:

Version Date Update Method Size Description Download From
1.78 4/11/2024 Instant Flash 18.28MB 1. Update AGESA 1.2.0.C. 2. Fix LogoFAIL security issue.

It’s been my experience that beta BIOSes don’t always list all the features that arrive in the release versions, just the really important ones, like security and AGESA updates.

The latest BIOS has an interface to disable the IPMI channel bond as well as the NCSI IPMI channel. Has that always been there or is that new?

I initially thought I’d bricked the board, but thank goodness I’d had SOCFLASH make a backup image. I was able to set the IP address variables and tftpboot the backup, then do sf probe and sf update 0x83000000 0 to restore to the backed-up image.

Looks like my image just built incorrectly the first time; after getting help in the OpenBMC Discord, I tried the dev’s image, and that worked, then I tried my own rebuilt image, and that worked too.

What works:

  • Web UI
  • iKVM
  • Keyboard input
  • SSH

What doesn’t work:

  • Serial over LAN
  • Fan control (always 100%)
  • Temperature Sensors
  • Hardware Inventory
  • A bunch of other stuff

Other observations:

  • It seems like the NCSI doesn’t like the 10GbE of the -2L2T model.
  • You’ll want to printenv on the stock firmware to get ethaddr and eth1addr, then set them on OpenBMC and saveenv.