AMD P-state driver

Ryzen will happily consume as much power as it can as long as there is thermal and electrical headroom to do so, even if it doesn’t actually need that much power to hit a certain frequency. The solution to rampant power consumption is to lower those limits: PPT, TDC, and EDC. SVI2_P_Core on my 5950X system used to peak around 135W under sustained multicore loads, now it’s around 75W with very little performance impact.

135W, amd_pstate=guided:

75W, amd_pstate=guided:

If you’re interested in doing something similar, you can try enabling ECO mode. However, on my two Asus X570 boards, that seems to set the limits too low; SVI2_P_Core peaks around 35W in this configuration. My solution is to disable PBO Limits, which seems to leave PBO enabled (with my curve, boost offset, etc.) but uses the default limits for the processor.

As far as this goes, I think the problem with amd_pstate=guided is that Linux scaling governors are currently somewhat limited in their ability to manipulate the minimum/maximum performance registers as described above. Per the last bit of the patch series notes:

In guided autonomous mode the min_perf is based on the input from the scaling governor. For example, in case of schedutil this value depends on the current utilization. And max_perf is set to max capacity.

So there’s no apparent way to lower max_perf for a process (group) or a list of CPUs in this mode.

Fully-autonomous mode (amd_pstate=active) might be the better fit for your use case. From the Arch wiki:

The most important feature of active governing is that only two governors appear available, powersave and performance . They do not work at all like their normal counterpart, however: these levels are translated into an Energy Performance Preference hint for the CPU’s internal governor.

[…]

It is possible to select in-between hints with the sysfs interfaces available. […] One can also pass a number between 0 (favor performance) and 255 (favor power).

So you should be able to use this mode and then set EPP hints to a value 0–255 for each CPU, then assign each process to a list of CPUs to get the desired scaling behavior for that process. (And maybe this is what you’re trying to do already, but until you get power consumption wrangled at a high level this probably won’t help that much.)