Intel Xeon Max 9480

monstercameron · November 28, 2024, 4:05am

Hi all,
I recently purchased an Ebay special Xeon Max 9480 with a Supermicro MBD-X13SEI-F-(O/B). I bought this chip to replace my EPYC 7713 and Asrock rack rome server board so that I could experiment with llama.cpp (with AMX), bitnet.cpp and lavapipe with higher thread count.

It feels quick in windows but when I run LM Studio(fancy GUI for LLama.cpp) and run it across all 112 threads I get lower results than 32 threads. I reckon the issue is that I am getting severely throttled. I was seeing 500MHz reported under a power virus(fully loaded llama.cpp). HWinfo is reporting 260W CPU draw under load. Intel has these chips marketed for 350W and the board has 2 8-pin cpu headers for extra power. How can I let the CPU stay clocked higher or set manual multipliers?

I fiddled with the power settings in the bios but its a bit cryptic and none of the obvious windows power plan tricks worked. Neither did throttlestop and xtu refuses to even install due to platform check. Any ideas?

twin_savage · November 28, 2024, 4:15am

How are you cooling it and what temperatures are you seeing? These CPUs will hit thermal throttle roughly 20C sooner than “normal” processors so they need significantly more cooling than their wattage would suggest.

Also it might be hard to tune it to use the memory most optimally in Windows, I think Linux might be needed to segment processes into the nearest HBM stack.
I’m eventually going to write a guide on how to do this once I figure out how to cool mine.

monstercameron · November 28, 2024, 4:25am

thanks for replying, here are some new metrics:

my bios tweaks have the chip bottoming out at 1.4GHz
I have a beefy aircooler with push/pull fans in an open case
max temp per throttlestop and hwmon is 60 c
avg power draw per hwmon 260W with max of 314W (I assume this is a 30 sec boost)
Windows is in high performance power plan with minimum CPU @ 100%
cpus range from 50-54c accross the die

twin_savage · November 28, 2024, 4:30am

Those temps should be good so I doubt they are the problem. Have you tried disabling C-states in BIOS (or atlast some of the higher C-states)?

For reference on power usage, running two of them on an asus z13pe-d16, I’m seeing 1100-1200 watts at the wall when underload, this is with one of the asus OC options on though, but all cores will stay at 2.6GHz underload.

monstercameron · November 28, 2024, 4:32am

I’ll give that a go, I was wondering if that would keep the CPU from bumping to higher C states under load, Testing rn.

monstercameron · November 28, 2024, 5:02am

Lots of permutations of the settings to go through:

I disable speedstep
- I think this is why the system idels +100W more @214W now
I set the package c state to c0/c1
- I was hoping it would stay pegged here
I disabled optimize power mode
Still throttling all core down to 1.4GHz

Are there are software utils that you know of?

twin_savage · November 28, 2024, 5:10am

I’d keep speedstep on and make sure turbo mode is enabled. Do you have OS Controlled EPB set under the Power Performance Tuning?

for tuning CPU performance?

monstercameron · November 28, 2024, 5:28am

Yes I re-enabled speedstep and a few other features, Ill add screen shots tomorrow if youre step up for it, cheers

fDrot · November 28, 2024, 12:14pm

What I know about Xeons is that they need a lot more cooling than most other processors, so you may need a more powerful cooler. Also, make sure you’ve disabled C-states and enabled turbo mode if available.

TryTwiceMedia · November 28, 2024, 1:18pm

If you run passmark, what’s your score?

Tried disabling hyperthreading?

RAM speed?

monstercameron · November 30, 2024, 6:30am

I briefly hit 320W @2GHz but it was shortlived, then it clocks itself down to around 270W per HWmon

Ill try to get passmark tonight, I didnt disable hyperthreading do you think this will keep the clocks high? I reckon HT will help with Llama.cpp performance.

The RAM isnt straightforward, I think its running the HBM2 in cache mode, the bios settings werent very clear. Im also on windows so Im not sure if windows can even handle it appropriately. The only good sign is that AIDA64 trial said I was doing 460GBps on 2 ddr5 dimms so I assume thats the HBM caching.

Although I have the sneaking suspicion that the ddr5 isnt being detected and its only see the hbm…

monstercameron · November 30, 2024, 6:35am

llama.cpp uses AVX2 extensions that have their own power limits, I wonder is this is what is tripping the CPU up? I wonder if AVX412 and AMX have the same behaviors?

It is hitting 350W on passmark, albeit for a few seconds.
https://www.passmark.com/baselines/V11/display.php?id=226011215022

monstercameron · November 30, 2024, 7:46am

Here the 32T run has clocks over 3GHz consistently

`(base) PS C:\Users\Cam\Desktop\repos\llama.cpp\build\bin\Release> llama-cli -m C:\Users\Cam.cache\lm-studio\models\NousResearch\Hermes-3-Llama-3.1-8B-GGUF\Hermes-3-Llama-3.1-8B.Q4_K_M.gguf -p “I believe the meaning of life is” -n 128 -t 32

llama_perf_sampler_print: sampling time = 24.83 ms / 136 runs ( 0.18 ms per token, 5476.58 tokens per second)
llama_perf_context_print: load time = 4542.30 ms
llama_perf_context_print: prompt eval time = 102.92 ms / 8 tokens ( 12.87 ms per token, 77.73 tokens per second)
llama_perf_context_print: eval time = 8018.75 ms / 127 runs ( 63.14 ms per token, 15.84 tokens per second)
llama_perf_context_print: total time = 8183.02 ms / 135 tokens`

While any more threads get down clocked alot

monstercameron · December 17, 2024, 3:11am

I was able to update the bios and it can get an all core workload at above 1GHz, still not quite fast enough.

still testing different config

twin_savage · December 17, 2024, 3:19am

I forgot to ask, what power plan/profile was set in Windows? I was able to get the 2.6GHz behavior on mine while on the “High Performance” plan, I’d imagine any of the lower ones might cause problems.

I recently tried the old pre-15th gen XTU utility and it installed, but it freezes on launch.

TryTwiceMedia · December 17, 2024, 4:34am

What are the thermals?
What is your thermal solution?

I am genuinely suspecting thermal throttling now that you’ve gotten it up and over the 2 GHz barrier
@JayVenturi is the only guy I’ve seen cool a 300+ watt CPU in a workstation case without fans screaming.

monstercameron · December 17, 2024, 5:07am

Its the high performance power plan, and in the efi I have it set to the virtualization profile, the HPC profile was throttling hard

monstercameron · December 17, 2024, 5:18am

Hey friend, the temps seem fine unless the sensors are faulty, I have a beefy aircooler and quite a few case fans

twin_savage · December 17, 2024, 5:29am

It might be worth it to try pulling out both sticks of RAM and running loads to see if you get higher clocks and power draw without any DIMMs installed. It’s possible having so few channels populated is doing something weird.

Your situation is odd because my CPUs have no problem pulling well above 350 watts without even getting into a heavily threaded workload.

monstercameron · December 17, 2024, 5:38am

I reckon I got it working stably now at 2.6GHz

So beautiful like a christmas tree

I updated the bios to:

BIOS Firmware Version	2.5
BIOS Build Time	10/25/2024

I’ll share my efi settings shortly.

Next mission is to get my gen 4 pcie bifurcation card working with 4 gen 4 nvme ssds and get full speed raids on windows.

I also dont know if the system even detects my ddr5 dimms, is the hbm supposed to be transparent to windows?