Tool for adjusting CPU power limit under Linux? (Intel and AMD)

We are looking to finally get proper UPSs for our servers and “servers”, but the prices of 3+ kVA rackmount UPSs and additional external battery packs are rather steep.
So I was thinking of another option: forgo a the extra battery packs, and get the required holdover time by cutting back power consumption on the servers, when on battery. We can tolerate all servers dropping down to something like 1 GHz until AC power comes back.

I figured the best way to do it would probably be to have a script lower the power limit that the turbo algorithm uses (PL1/PL2 for Intel, PPT for AMD).

Are there any good command line utilities for adjusting these values on Haswell-E, Skylake-X/SP, Zen1 and Zen2? I know the governor can be adjusted to “powersave” and that does work to a certain extent, but not exactly what I am looking for.

Not going to make much of a difference. Maybe write a script that instead works to minimize load on the CPU, putting VMs into standby, or shutting servers down all together.

Having a server room full of expensive rack mount UPSes, and feeding them batteries every few years. I wish I had 20kva cabinet UPS… if you have the room, look at something like that, especially one that takes lithium batteries so you get a longer life.

The servers are running AVX loads on pretty much all cores, at pretty much all times. Thus, the CPUs are typically running either at or close to their stock power limits, as they are. No GPUs to worry about, so reducing the power limit from 165 W to say 60 W on a Skylake-X CPU would surely go a long way.

We do not use VMs at all, all stuff runs completely bare metal. (EDIT: I mean just under Linux, without any VM/hypervisor)

Shutting the severs down completely would be extremely disruptive, for us the whole point of having UPSs is to preserve RAM contents during an outage. If we did not care about preserving system RAM contents, we would not bother with UPSs for our compute boxes, but unfortunately some calculations run for weeks and do not make meaningful checkpoints.

On my HPE machines in ILO, I can set the power manager for performance, balanced or power saver, and also cap power at X number of watts. On my Windows compute boxes I have it set to performance with no caps. I’ve never tried changing the power manager while the OS was running, not sure if it would do anything, or if it just reconfigures BIOS settings for the next reboot?

I think Dell has a similar feature in iDRAC.

What server hardware are you running?

For linux, you can play with the governor:

Well, remember when I said servers and “servers”? We have 2 Fujitsu servers, with Gigabyte EPYC servers on the way, but the bulk is an assortment of racked Intel HEDT and AM4 desktops.

Under Windows, these values can be easily adjusted with Intel XTU and AMD Ryzen Master.

The governor is OK, I have experimented with it already, and managed to reduce CPU power substantially. If no better option is found that will be the solution, as it is quite easy to use.

BTW, changing the power limit on a running system is non-persistent and the BIOS settings are not really involved.

In case of Intel, one has to write the correct values to some MSRs of the CPUs, and that’s it. I am pretty sure this is found in public documentation, but it is rather dense low level stuff, and I would prefer to avoid writing my own tool to twiddle CPU register bits, if someone else has already done the work.

I believe this works on desktop/server Intel systems, although I’ve only tried it on laptops.

$ cat /usr/local/bin/set-rapl
 #!/bin/sh

echo 1        > /sys/class/powercap/intel-rapl:0/enabled
echo 30000000 > /sys/class/powercap/intel-rapl:0/constraint_0_power_limit_uw
echo 30000000 > /sys/class/powercap/intel-rapl:0/constraint_1_power_limit_uw

That’s what I use to keep my Dell XPS power use to 30W and temperature down to 80C. If I let it run at the default 45W it bounces of the 95C thermal limit constantly. I should probably open it up and repaste it.

See if you can use the powercap RAPL for your use-case.

I don’t know how to do it for AMD.

And here’s another fun little script that uses the RAPL system to report current energy usage:

#!/bin/bash

last=$(</sys/class/powercap/intel-rapl\:0/energy_uj)
while sleep 1
do
        x=$(</sys/class/powercap/intel-rapl\:0/energy_uj)
        # One joule per second is a watt.
        printf "%'d \u03bcW\n" $((x - last))
        last=$x
done

Thanks, that is probably exactly what I wanted for Intel at least.

From some stuff I just read online this may not be enabled by default, there’s a /sys/class/powercap/intel-rapl:0/enabled file that may have to be set. Apparently laptop BIOS always turns it on, but servers may not.

So you may have to add that to my script.

Wouldn’t it be easier to set the maximum frequency with:

cpupower frequency-set -u clock_freq

That is also what I would have suggested.

You should be able to also do that via the proc / SYS fs thingys.

Has been a while that I dug through those.
I remember the governor be an asshole though, last time I tryed to OC unlocked broadwell in Linux via msr turbo values.

You could use containers, by that I mean cgroups to throttle the workload.

You wouldn’t necessarily be inputting watts into the kernel, but you could make a simple linear estimate of cpu shares/W , divide it by e.g. 5 and just run the tool in a loop every second or every 100ms to have a similar effect to capping power.

Reducing maximum frequency should be more power efficient as the silicone becomes more efficient when run under lower frequency.

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.