AMD Epyc 7443 high package (die) temperature

Hi All,
I just bought from ebay a supermicro H12SSL + AMD Epyc 7443 + 256 GB DDR 4 + 4U CPU cooler.

CPU cooler came with thermal paste pre applied so I installed directly on CPU without additional paste.

Everything worked fine from first attempt : IPMI login , BMC and Bios update, and installed Windows 11 pro workstation on a Samsung 980 pro 1 TB for testing.

I have noticed that CPU package or CPU Die temperature is very high at idle > 80°C, CPU power draw was ~ 72 w.

Did a stability test with OCCT , but test stopped when die temperature reached 96°C in 5 seconds.

Please advise if I should replace the thermal paste ? can this be a faulty CPU?

HWMonitor and CPU-z screenshots :

The 4U CPU cooler :

Thank you
LM

The CCD/core temps in your HWMonitor screenshot look fine - actually quite good, for Milan at idle. I think I’ve heard of the package temperature (Tctl) being much higher as a sensor driver issue - there’s some servethehome forum threads on that.

It used to affect old Linux kernels before 6.2 (k10temp driver) but I have no knowledge of Windows - perhaps you can update the AMD drivers?

You could try booting a Linux live-USB (with kernel 6.2+ - anything other than the RHEL-derived distros) and checking the output of sensors to see if it is different - I’d guess that it would be.

edit : The fix for Linux was hwmon: (k10temp) Check range scale when CUR_TEMP register is read-write - kernel/git/stable/linux.git - Linux kernel stable tree
This shows a similar high Tctl value which was actually a false reading caused by a bug. Check if your IPMI sensors show the same value - if not, then I’d say that the Windows driver has a similar issue and needs to be updated.

3 Likes

Thank you @xzpfzxds for your reply,

I have used a Kali Linux USB with kernel 6.6 and did the some tests. As you have explained, the CPU temperature was mid 30°C (33-37°C) when idle and maxed at 62°C but mostly around 56°C when doing a stress test with s-tui.

Idle

under stress


The power consumption during stress test was about 250 W, 21W was used but the GPU and the rest is CPU, Motherboard, 2 X nvme and 1x SSD.

It seems the cooling is doing ok :slight_smile: , thank you for the tip :handshake:

I have noticed that some cores reach the maximum frequency 4.024 Ghz when idle but during the stress test ,when all at 100% load, the frequency does not exceed ~3.8 Ghz.

Is this a normal limitation ?

Thanks
LM

Those s-tui screenshots look fine to me. Thermals actually look very good.

Yes it is normal. For the EPYC 7443 AMD officially advertizes a 4.0GHz boost clock and 2.85GHz base clock.

That means the minimum frequency you can expect, given good thermals, is a single core to run at 4.0GHz at 100%, and all cores to run at 2.85GHz at 100% load.

For EPYC’s the base clock is very conservative, and in practice you’ll see higher frequencies - like in your case. Usually it is measured with a worst-possible stress-test (i.e., integer + float + AVX + cache units fully utilized) and at the thermal limit.

For reasons I’ve never found, every Milan I’ve seen that AMD advertise as a 4.0GHz boost clock limit, is actually 4.025GHz or 4.050GHz. Not a prob, you just get a bit more for free :slight_smile:

2 Likes

Thanks a lot @xzpfzxds , It would have been nice if all cores can at 4.0 Ghz at 100% load :grinning:

Do you think VMs in Proxmox can benefit from CPU frequency boost? (Windows or Linux VMs)

I would probably slow down the case fans . I still need to find out how to setup the Noctua fans to avoid making the bios think there are stopped when running at low rpm.

Regards
LK