Confirmed - Windows performance problems on SPR 3400/2400

Hot off the press… confirms what many of us running Windows on Sapphire Rapids W2400/3400 have been seeing.

Windows* Responsiveness on Systems with Intel® Xeon® W-3400…

Let’s hope they have some decent fix for this, now that it’s a known Sapphire Rapids issue. Hey, maybe they will just give us all Emerald Rapids CPU’s as replacements…

I ran high perf because the windows scheduler was being dumb. Linux seemed to work okay in its default on demand config.

For me I had weird windows perf glitches still I disabled GPU accelerated scheduling

Excerpt from the main W790 thread to quantify the problem, it isn’t just during application startup or with the balanced power profile:

​​​ ​ ​

​​​ ​ ​
Here’s where things get interesting, I switched to Linux just to see how performance was for FEA and running the same FEA benchmark that was run on Windows previously, it performed 30% faster! The benchmark is not known to give very large discrepancies between Linux and Windows on previous architectures. I think this confirms there is something wrong with SPR-WS on Windows. The benchmark is not perfectly deterministic, but it is fairly consistent run to run.

For comparison, here are comsol benchmark runtimes on other systems:
5995wx runs 27m 3s
Dual 8173M run 26m 42s
Dual E5-2697A run 44m 0s
Overclocked W5-3435x runs 17m 10s *on Linux
Overclocked W5-3435x runs 23m 24s *on Windows

2 Likes

I found that if I disabled the C6 sleep state, the machine behaves just fine. It uses more power at idle, as expected, but, the machine is usable with Windows. I found Linux was just fine with no tweaks at all. It seemed strange to me, that disabling C6 with Windows made things ok, and Linux did not need that. Now I just use my steamdeck to switch between balanced and ultra performance when needed.

1 Like

Is it just me, or does Windows vs linux seem backwards anymore. When i first got interested in linux, i remember hearing advice about disabling sleep states etc to get it to cooperate.

Interesting; I just assumed this was Normal Windows Things. On intel I always set high performance, even on alder/raptor lake… and on amd set balanced.

The problem does also exist on Linux, but it’s a lot less noticeable. For a long-running or multi-thread, it’s only visible for the first few seconds. For burst single thread workload (e.g., rendering a web page on Firefox), however, it is perceivably slower.

I have a single thread unit test that does the latter on Linux, which finishes in 10 minutes on powersave governor, and 3 minutes on performance. Funnily enough, just having stress-ng --cpu 1 running in the background made it finish in 3 minutes.

On Linux, there are two more knobs that can affect this behavior:

  • EPP located at /sys/devices/system/cpu/cpu*/cpufreq/energy_performance_preference (0-5 where 0 is performance and 5 is powersave)
  • EPB located at /sys/devices/system/cpu/cpu*/power/energy_perf_bias (0-15 where 0 is performance and 15 is powersave)

By default, Linux will set energy_perf_bias to 6 if it was set to 0 from the BIOS (i.e. the “Boot Performance”-esque setting on most BIOS). On SPR, I’ve found setting this value to 4 (balanced_performance) or 0 (performance) fixes the slowness issue for the aforementioned burst single-thread use case while not completely disabling C6 state.

I’m not aware of such a knob on Windows though I think you could write a program that manually writes to 0x1B0 MSR to control this behavior. It does appear that Windows is using a value lower than 6 for the Balanced Power Profile.

On a Supermicro board, there’s an explicit setting that allows disabling the OS ability to override EPB, so I can confirm the effect of EPB value on Windows. Setting this value on Supermicro board results in the following (on Windows with balanced profile):

CPU ConfigurationAdvanced Power Management Configuration

  • Power Technology: Custom
  • Power Performance Tuning: BIOS Controls EPB
  • ENERGY_PERF_BIAS_CFG: either
    • Balanced Performance (idles ~50W, some lag, but not as annoying)
    • Maximum Performance (idles ~70W, no lag)

When set this to Balanced Power, Windows idles at ~50W, but with severe lag. Anecdotally, Supermicro has switched the default settings to “BIOS Controls EPB” since BIOS 1.2 released last month.

My best guess of what EPB knob does is it makes the process enter C6 sleep state less, and/or keeps at least a single core out of that state. Setting EPB = 0 also makes the processor always boost at least a single core (while EPB = 4 doesn’t do that). It’s still different than High Performance profile, as Windows would still try to park the cores.

On Linux, I’ve resorted to this script, which does what I expect for what I would call a “balanced” profile (run: /path/to/script.sh balanced):

#!/bin/sh

PROFILE=${1:-powersave}

EPB_DEFAULT=
EPP_DEFAULT=

case "$PROFILE" in
    performance )
        GOVERNOR=performance
        EPB_DEFAULT=performance
        EPP_DEFAULT=performance
        ;;
    balanced )
        GOVERNOR=powersave
        EPB_DEFAULT=performance
        EPP_DEFAULT=performance
        ;;
    powersave )
        GOVERNOR=powersave
        EPB_DEFAULT=balance-power
        EPP_DEFAULT=balance_power
        ;;
    * )
        echo >&1 "Unknown profile: $PROFILE"
        exit 1
        ;;
esac

EPB=${2:-$EPB_DEFAULT}
EPP=${3:-$EPP_DEFAULT}

echo "Setting CPU Governor: $GOVERNOR"
cpupower frequency-set --governor "$GOVERNOR" >/dev/null

echo "Setting CPU Governor Energy Performance Preference: $EPP"
for n in /sys/devices/system/cpu/cpu*/cpufreq/energy_performance_preference; do
    if ! [ -f "$n" ]; then
        echo "Could not set Energy Performance Preference"
        break
    fi
    echo "$EPP" > "$n"
done

echo "Setting CPU Energy Performance Bias: $EPB"
for n in /sys/devices/system/cpu/cpu*/power/energy_perf_bias; do
    if ! [ -f "$n" ]; then
        echo "Could not set Energy Performance Bias"
        break
    fi
    echo "$EPB" > "$n"
done
4 Likes

Thanks for this, and agree with you!

This has been somewhat of a tinkertoy to get working right. I too had to set BIOS control for EPB on this system. If I left it Native with the OS controls, the fans would not spin up at all. hwinfo would show temps at 95C after a few seconds of anything intensive like Prime95.

What I have noticed, is while the machine is performing ok, with BIOS control and C6 disabled, the boost clock speed on this 2455x is not going over 3.9Ghz. If I just left everything as BIOS default, and set ultra high performance, I would get 4.4 GHz at times.

With hwinfo the PL1 is 200W and PL2 is 241W. If I look at the max per usage, on this machine with the BIOS and C6 disabled, it tops out at about 215W. I look at the reasons in hwinfo and it shows the voltage caps being hit (Core x Power Limit Exceeded, Electrical Design Point and Package Level RAPL all trigger to “yes” with AVX sometimes going “yes”). I find it strange that the power is far below the PL2, but it’s saying its voltage limited.

Oddly enough XTU has never behaved well on this particular machine. It always complained about undervolting. Core Isolation and Memory integrity disabled. When I disabled C6, I no longer get the undervolt warning, but it refusee to load due to the VCHI or whatever message.

These little issues are leading me to think there’s more going on with SPR than meets the eye. If you google Windows Server, and recommendations from Microsoft, it says set your server to “High Performance” to avoid any lag, as a general rule. So, some of the problem is masked due to that “just apply high performance and forget it”. In the case of my machine, that’s a 50-60W idle difference, not to mention all the heat it throws. It’s a bit of a quest to get the power down on this machine. I expected SPR to be much hotter, louder, and use more energy than my older W3235, but, not these issues of usability which have cropped up.

1 Like

Power is watts, watts is voltage times amps. So it can make sense.
To oversimplify it, more complex operations can use more current (amps) at same voltage and frequency.

Well, I guess that’s more of side note vs directly addressing being voltage limited. Maybe they went with a less variable performance harder limit for voltage frequency curve, instead of later consumer stuff the does all the goofy thermal boost. I don’t know enough about the various workloads, how many cores each uses, etc, to say much else.

Maybe intel is confident enough in stabillity at higher clocks, but decided to target lower power consumpotion, and if you want to bump up the voltage, it’ll just go faster… and be hotter etc. Assuming all-around competence, that would make a lot of sense. Assuming that further, maybe windows assumes it needs to back off a little from hitting that limit.

Does your board have a voltage offset option?

Thanks! I originally thought that as well, particularly with the AVX512 workloads. I did testing disabling the AVX512, AVX2, etc., and the throttling due to voltage stayed the same. On this particular machine, there’s no way to tune the voltage in the BIOS, only through the XTU. When I revert the system to BIOS defaults, the 4.4 comes back. Very strange!

Messing with my Ivybridge system recently, I mistakenly disabled turbo because it was a setting labelled “power efficiency” or “disabled”. This thing barely has enough IPC to give me a respectable Fortnite experience (I mean, nothing can, because Fortnite turned into a joke, but you know, FPS)… and that’s with hardly anything else running.

My understanding is the issue is with the Windows scheduler itself. BIOS enabled high c-states might exacerbate the issue but I’ve been able to juice all my voltages and set my multipliers in BIOS on 3435x and the ~30% performance deficit persists whenever I use Windows.

2 Likes

Yeah, there’s kind of multiple, closely related, conversations happening here.

At least they finally acknowledged it. I’m still using the Balanced power plan, trying to keep low utilization/idle power consumption/temps down. The ASRock board also defaults to BIOS controls EPB. It still works for the most part but it does have problems ramping clock speed, and slow wake from sleep.

Fixed by Intel… I received a microcode and ME update for my SPR w2455x machine a week ago, which included some fixes from Intel on the W2400/3400 lag issue. Turns out its processor related and not Windows scheduling for the balanced performance bug. Hopefully the microcode and ME will filter out to others on other boards to solve some of their performance issues. For the curious, I believe its linked to EPP, but, have not had a chance to dive in and take a deeper look.

1 Like

Are you on 0x2b0004b1 or a newer version?
Was the microcode update via a vendor BIOS update or via Windows update?

MC = 0x2b000580

I have a Lenovo P5.

Hope this helps!

2 Likes

Realized I typed a bit too fast (getting dinner for kids)…

I had early access to S0CKT13A, 1.0.0.19 which is the latest production code for SPR on Lenovo P5/P7 (W2400/W3400). This was also one of those “cannot go back to earlier” firmware updates, as both the recovery and primary BIOS on the machine where moved to 1.0.0.19.

Interesting, thanks. Looks like publicly release intel-ucode for SPR is still 0x2b0004d0 (20231114). I hope they push up this new version soon so I can finally retire my EPP/EPB script :slight_smile:

(On the other hand, Supermicro has since switched back to OS controlled EPB since BIOS v2.0, I’m assuming they’re anticipating a microcode fix coming soon-ish.)

Have you noticed a difference?