AMD HSMP & eSMI - On-the-fly control of fabric, clocks, power and more

Reference Material from the video

GitHub - amd/amd_hsmp: AMD HSMP module to provide user interface to system management features.

GitHub - amd/esmi_ib_library: E-SMI: EPYC™ System management Interface In-band Library

Processor Programming Reference (PPR) for AMD Family 1Ah Model 02h, Revision C1 Processors (57238)

What is AMD e-SMI?

The EPYC™ System Management Interface In-band Library, or E-SMI library, is part of the EPYC™ System Management Inband software stack. It is a C library for Linux that provides a user space interface to monitor and control the CPU’s power, energy, performance and other system management features.

… this uses the HSMP module which is the interface to these system management features.

What does this do for us?

If you know Ryzen Master on desktop-class CPUs, then this provides an interface for a lot of similar functionality but with Epyc cpus. If you need to adjust the cTDP all the way down on the fly, then it is possible. That’s one of the power-saving scenarios I demo in the video.

It is also possible to fine-tune the infinity fabric speed (within supported limits – no overclocking) and to tell the CPU to prioritize, for example, fabric and I/O operations instead of power for compute and cores.

For advanced bios features like streaming enhancement and L3 cache management policies, it is also possible to tweak those on the fly using the documented HSMP registers from the PPR reference updates that just dropped.

It is incredibly useful for anyone looking to do workload diagnostics and fine-tuning. Options that have been in the bios since Rome generation can now be tweaked on the fly from the command line, which greatly speeds testing and optimization efforts.

The Key To Using

(from github)

HSMP PCIe interface needs to be enabled in the BIOS. The CBS option can be found by navigating to the following path

##### Advanced > AMD CBS > NBIO Common Options > SMU Common Options > HSMP Support

##### BIOS Default: “Auto” (Disabled)

If the option is disabled, calls to the SMU will result in a timeout.

Not all bioses expose this option; if your board does not have this option please ping your board vendor to inquire.

Does this work on Threadripper?

No, not presently. If this is something you’d like to see however, please PLEASE engage and let AMD know so they can do the qual work to get it going.

4 Likes

Hey Wendell! Cool video and thanks for the description. Is there a way to make use of it to monitor the EPYC CPU properly via Grafana / telegraf or something like that? Sure, it can do much more, but monitoring would really be awesome as a start.

yes and infact if you paste the relevant 3-4 pages from the PDF Claude or chatgpt can do a plugin to get you most of the way there

1 Like

How recent does the Epyc need to be, does this work for Naples or Rome too?

1 Like

sure does. depends on board more than chip. technically Milan is first supported but it happens to work unofficially and unsupportedly on the rome board I tested with a recent bios

you might need to build your own e SMI like too however as the registers are somewhat different but still documented with the updated docs

Naples is too old

1 Like

How best to engage AMD regarding Threadripper?

igotchufam, already hitting that angle hard

Shame Naples is too old, still have one in use as a home server.

Does this work on Threadripper?

Good thing I did not find this thread earlier since it sort of works on a Threadripper PRO 7975WX:

============================= E-SMI ===================================

--------------------------------------
| CPU Family            | 0x19 (25 ) |
| CPU Model             | 0x18 (24 ) |
| NR_CPUS               | 64         |
| NR_SOCKETS            | 1          |
| THREADS PER CORE	| 2 (SMT ON) |
--------------------------------------

-----------------------------------------------------
| Sensor Name                    | Socket 0         |
-----------------------------------------------------
| Energy (K Joules)              | NA (Err: 1 )     |
| Power (Watts)                  | 96.357           |
| PowerLimit (Watts)             | 350.000          |
| PowerLimitMax (Watts)          | 2000.000         |
| C0 Residency (%)               | 0                |
| DDR Bandwidth                  |                  |
|	DDR Max BW (GB/s)        | 307              |
|	DDR Utilized BW (GB/s)   | 0                |
|	DDR Utilized Percent(%)  | 0                |
| Current Active Freq limit	 |                  |
|        Freq limit (MHz)        | 4471             |
|        Freq limit source	 | Refer below[*0]  |
| Socket frequency range         |                  |
|        Fmax (MHz)              | 5350             |
|        Fmin (MHz)              | 400              |
-----------------------------------------------------

-----------------------------------------------------------------------------------------------------------------
Failed: to get CPU energies, Err[1]: Energy driver not present
-----------------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------------------
| CPU boostlimit in MHz:                                                                                        |
| cpu [  0] : 5350  5350  5350  5350  5350  5350  5350  5350  NA    NA    NA    NA    NA    NA    NA    NA	|
| cpu [ 16] : 5350  5350  5350  5350  5350  5350  5350  5350  NA    NA    NA    NA    NA    NA    NA    NA	|
-----------------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------------------
| CPU core clock in MHz:                                                                                        |
| cpu [  0] : 2216  2216  2216  2216  2216  2216  2216  2216  1977  4450  1977  1977  2216  2216  2216  2216    |
| cpu [ 16] : NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA	|
-----------------------------------------------------------------------------------------------------------------
*0 Frequency limit source names:
 Reserved


^[31mErr[1]: Energy driver not present
^[0m
============================= End of E-SMI ============================

I can’t get amd_energy to compile right now, but that’s what issues are for.

I put all that in my video today! it does work on 7000/9000 series TR mostly.

anything in dmesg when you modprobe amd_hsmp?

you may need to explicitly enable in bios as auto may be a partial enablement

Any chance for Threadripper 5000?

Maybe, if you have the bios option to enable it.

I seemed to compile fine on Ubuntu 25.04 fwiw

confirm working here on 7960X/ASUS TRX50, on bios 1203

There was a bit a downpour of:

[1144518.791246] amd_hsmp amd_hsmp: Message ID 0xA failure : Invalid arguments (status = 0xFF)
[1144518.794070] amd_hsmp amd_hsmp: Message ID 0x1A failure : Invalid arguments (status = 0xFF)

…but only while running e_smi_tool. FWIW, these also appear in journalctl.

I suspect they are related to the missing energy driver…

…which now compiles but does not load :person_shrugging::

$ modprobe -v amd_energy
insmod /lib/modules/6.15.6-1.el9.elrepo.x86_64/extra/amd_energy.ko.xz
modprobe: ERROR: could not insert 'amd_energy': No such device

I may have indeed to go back to BIOS but I need to wait for some downtime when the 4 dual 100G NICs are not pumping data.

ps - I also have to make sure I get the new BIOS for the Pro WS WRX90E-SAGE SE.

ah. yes. need new bios and to explicitly enable hsmp in amd menus. That’s why I’m so excited about the platform updates. everyone gets upgraded even early adopters with old cpus

1 Like

After the new BIOS, it’s looking the same, but I admit that I left the HSMP option in “Auto”. Should force to “Enabled”?

probably? what board was this? We’ve got quite a few reports of “works for me” at this point