Intel Xeon Max 9480

Hi all,
I recently purchased an Ebay special Xeon Max 9480 with a Supermicro MBD-X13SEI-F-(O/B). I bought this chip to replace my EPYC 7713 and Asrock rack rome server board so that I could experiment with llama.cpp (with AMX), bitnet.cpp and lavapipe with higher thread count.

It feels quick in windows but when I run LM Studio(fancy GUI for LLama.cpp) and run it across all 112 threads I get lower results than 32 threads. I reckon the issue is that I am getting severely throttled. I was seeing 500MHz reported under a power virus(fully loaded llama.cpp). HWinfo is reporting 260W CPU draw under load. Intel has these chips marketed for 350W and the board has 2 8-pin cpu headers for extra power. How can I let the CPU stay clocked higher or set manual multipliers?

I fiddled with the power settings in the bios but its a bit cryptic and none of the obvious windows power plan tricks worked. Neither did throttlestop and xtu refuses to even install due to platform check. Any ideas?

1 Like

How are you cooling it and what temperatures are you seeing? These CPUs will hit thermal throttle roughly 20C sooner than “normal” processors so they need significantly more cooling than their wattage would suggest.

Also it might be hard to tune it to use the memory most optimally in Windows, I think Linux might be needed to segment processes into the nearest HBM stack.
I’m eventually going to write a guide on how to do this once I figure out how to cool mine.

3 Likes

thanks for replying, here are some new metrics:

  • my bios tweaks have the chip bottoming out at 1.4GHz
  • I have a beefy aircooler with push/pull fans in an open case
  • max temp per throttlestop and hwmon is 60 c
  • avg power draw per hwmon 260W with max of 314W (I assume this is a 30 sec boost)
  • Windows is in high performance power plan with minimum CPU @ 100%
  • cpus range from 50-54c accross the die

Those temps should be good so I doubt they are the problem. Have you tried disabling C-states in BIOS (or atlast some of the higher C-states)?

For reference on power usage, running two of them on an asus z13pe-d16, I’m seeing 1100-1200 watts at the wall when underload, this is with one of the asus OC options on though, but all cores will stay at 2.6GHz underload.

2 Likes

I’ll give that a go, I was wondering if that would keep the CPU from bumping to higher C states under load, Testing rn.

Lots of permutations of the settings to go through:

  • I disable speedstep
    • I think this is why the system idels +100W more @214W now
  • I set the package c state to c0/c1
    • I was hoping it would stay pegged here :frowning:
  • I disabled optimize power mode
  • Still throttling all core down to 1.4GHz

Are there are software utils that you know of?

I’d keep speedstep on and make sure turbo mode is enabled. Do you have OS Controlled EPB set under the Power Performance Tuning?

for tuning CPU performance?

Yes I re-enabled speedstep and a few other features, Ill add screen shots tomorrow if youre step up for it, cheers

1 Like

What I know about Xeons is that they need a lot more cooling than most other processors, so you may need a more powerful cooler. Also, make sure you’ve disabled C-states and enabled turbo mode if available.

1 Like

If you run passmark, what’s your score?

Tried disabling hyperthreading?

RAM speed?

1 Like


I briefly hit 320W @2GHz but it was shortlived, then it clocks itself down to around 270W per HWmon

Ill try to get passmark tonight, I didnt disable hyperthreading do you think this will keep the clocks high? I reckon HT will help with Llama.cpp performance.

The RAM isnt straightforward, I think its running the HBM2 in cache mode, the bios settings werent very clear. Im also on windows so Im not sure if windows can even handle it appropriately. The only good sign is that AIDA64 trial said I was doing 460GBps on 2 ddr5 dimms so I assume thats the HBM caching.

Although I have the sneaking suspicion that the ddr5 isnt being detected and its only see the hbm…

llama.cpp uses AVX2 extensions that have their own power limits, I wonder is this is what is tripping the CPU up? I wonder if AVX412 and AMX have the same behaviors?

It is hitting 350W on passmark, albeit for a few seconds.
https://www.passmark.com/baselines/V11/display.php?id=226011215022

1 Like

Here the 32T run has clocks over 3GHz consistently

`(base) PS C:\Users\Cam\Desktop\repos\llama.cpp\build\bin\Release> llama-cli -m C:\Users\Cam.cache\lm-studio\models\NousResearch\Hermes-3-Llama-3.1-8B-GGUF\Hermes-3-Llama-3.1-8B.Q4_K_M.gguf -p “I believe the meaning of life is” -n 128 -t 32

llama_perf_sampler_print: sampling time = 24.83 ms / 136 runs ( 0.18 ms per token, 5476.58 tokens per second)
llama_perf_context_print: load time = 4542.30 ms
llama_perf_context_print: prompt eval time = 102.92 ms / 8 tokens ( 12.87 ms per token, 77.73 tokens per second)
llama_perf_context_print: eval time = 8018.75 ms / 127 runs ( 63.14 ms per token, 15.84 tokens per second)
llama_perf_context_print: total time = 8183.02 ms / 135 tokens`

While any more threads get down clocked alot :frowning:

1 Like