AMD Epyc 7443 high package (die) temperature

Hi All,
I just bought from ebay a supermicro H12SSL + AMD Epyc 7443 + 256 GB DDR 4 + 4U CPU cooler.

CPU cooler came with thermal paste pre applied so I installed directly on CPU without additional paste.

Everything worked fine from first attempt : IPMI login , BMC and Bios update, and installed Windows 11 pro workstation on a Samsung 980 pro 1 TB for testing.

I have noticed that CPU package or CPU Die temperature is very high at idle > 80°C, CPU power draw was ~ 72 w.

Did a stability test with OCCT , but test stopped when die temperature reached 96°C in 5 seconds.

Please advise if I should replace the thermal paste ? can this be a faulty CPU?

HWMonitor and CPU-z screenshots :

The 4U CPU cooler :

Thank you
LM

The CCD/core temps in your HWMonitor screenshot look fine - actually quite good, for Milan at idle. I think I’ve heard of the package temperature (Tctl) being much higher as a sensor driver issue - there’s some servethehome forum threads on that.

It used to affect old Linux kernels before 6.2 (k10temp driver) but I have no knowledge of Windows - perhaps you can update the AMD drivers?

You could try booting a Linux live-USB (with kernel 6.2+ - anything other than the RHEL-derived distros) and checking the output of sensors to see if it is different - I’d guess that it would be.

edit : The fix for Linux was hwmon: (k10temp) Check range scale when CUR_TEMP register is read-write - kernel/git/stable/linux.git - Linux kernel stable tree
This shows a similar high Tctl value which was actually a false reading caused by a bug. Check if your IPMI sensors show the same value - if not, then I’d say that the Windows driver has a similar issue and needs to be updated.

3 Likes

Thank you @xzpfzxds for your reply,

I have used a Kali Linux USB with kernel 6.6 and did the some tests. As you have explained, the CPU temperature was mid 30°C (33-37°C) when idle and maxed at 62°C but mostly around 56°C when doing a stress test with s-tui.

Idle

under stress


The power consumption during stress test was about 250 W, 21W was used but the GPU and the rest is CPU, Motherboard, 2 X nvme and 1x SSD.

It seems the cooling is doing ok :slight_smile: , thank you for the tip :handshake:

I have noticed that some cores reach the maximum frequency 4.024 Ghz when idle but during the stress test ,when all at 100% load, the frequency does not exceed ~3.8 Ghz.

Is this a normal limitation ?

Thanks
LM

Those s-tui screenshots look fine to me. Thermals actually look very good.

Yes it is normal. For the EPYC 7443 AMD officially advertizes a 4.0GHz boost clock and 2.85GHz base clock.

That means the minimum frequency you can expect, given good thermals, is a single core to run at 4.0GHz at 100%, and all cores to run at 2.85GHz at 100% load.

For EPYC’s the base clock is very conservative, and in practice you’ll see higher frequencies - like in your case. Usually it is measured with a worst-possible stress-test (i.e., integer + float + AVX + cache units fully utilized) and at the thermal limit.

For reasons I’ve never found, every Milan I’ve seen that AMD advertise as a 4.0GHz boost clock limit, is actually 4.025GHz or 4.050GHz. Not a prob, you just get a bit more for free :slight_smile:

2 Likes

Thanks a lot @xzpfzxds , It would have been nice if all cores can at 4.0 Ghz at 100% load :grinning:

Do you think VMs in Proxmox can benefit from CPU frequency boost? (Windows or Linux VMs)

I would probably slow down the case fans . I still need to find out how to setup the Noctua fans to avoid making the bios think there are stopped when running at low rpm.

Regards
LK

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.