Threadripper 9960X CPU Cache L3 Errors

Hako · December 31, 2025, 6:40pm

Hi guys,

Please help me figure this out.

Recently my relatively new (about 3 months old) TR system has been spewing erros here and there and I have no idea what they mean or what component could be defective.

HWiNFO64 informs me of a WHEA error.
Checking Event Viewer tells me:

A corrected hardware error has occurred.

Component: Memory
Error Source: Corrected Machine Check

HWiNFO64 also informs me of a “CPU Cache L3 Error”

Can someone tell me if this is the CPU, motherboard or RAM failing?

In case it matters this is my system:
TR 9960X Stock
Gigabyte TRX50 Aero D Rev 1.2 BIOS: FA3e
128GB RAM G.Skill 6000MT/s CL30 EXPO
Nv 3080 Ti
Intel Arc Pro B50
EVGA SuperNOVA 1000 G6
W11 LTSC

I cannot reproduce a situation in whose these errors happen.
They happen randomly.

Hopefully someone can help me figure this out.

Strawberry · December 31, 2025, 8:06pm

Your power supply might be struggling with this. Not because 1kw isn’t enough on paper, it’s borderline, but because the spiky nature of these components. A voltage dropping very, very slightly due to load can cause the sorts of errors that your enterprise gear can deal with (but would probably result in a bsod if you weren’t using threadripper and ECC memory).

freecableguy · December 31, 2025, 8:27pm

Tell us more about your PSU (i.e., actual model). You really should have an ATX 3.0 or higher specification PSU for this system. Transient response for high excursion power events is critical for systems of this type.

Hako · December 31, 2025, 8:35pm

Updated start post with the PSU + link to manufactuerer.

It’d be weird if it’s the PSU though, does this TR 9960X suck that much more power than a TR 3960X?
I had the PSU running my previous TR 3960X system just fine…
Can’t really check that for now though unless I get a new one…

rjsams · December 31, 2025, 9:57pm

Since this is used, have you checked to see if the previous owner enabled any form of overclocking in the bios?

If you (or previous owner) poked any OC or power settings, these chips can pull a tremendous amount of power. Easily way our of spec for that PSU. If it looks like bios settings may have been changed I’d be tempted to do a reset to defaults and see if the problems resolve.

Hako · December 31, 2025, 10:28pm

It’s not used, I bought it brand new.
There has been no overclock done.
Sorry to confuse with the “relatively new” part. I was just saying with that that it’s not that old yet

Janos · January 1, 2026, 3:45am

Underclock the system and see what happens.

Hako · January 1, 2026, 10:12am

Update: I let Memtest5 0.13.1 with 1usmus’ config run overnight and just this morning I got a WHEA uncorrectable error happen and was greeted by a BSOD when I came into the room.

Seems like underclocking is the next step yes.
Kinda annoying that you can only configure the first 2 CCDs with this board.

thro · January 1, 2026, 12:37pm

If you aren’t pushing much else in the system (and MemTest won’t be) I’d RMA the chip at this point.

Your PSU should be plenty, MemTest won’t be pushing the GPU at all or anything else. Its not like this is a top end thread ripper either.

If the chip can’t run memtest properly on a 1000W PSU and keeps getting L3 cache failures, its junk.

I’d put dummy user mode on, stop fucking about and just RMA it. At the very least, initiate the process with your reseller or direct with AMD as appropriate - sooner rather than later so there’s a paper trail of when this sort of stuff started.

Sometimes parts are faulty. It happens.

Underclocking might help, but all that will prove is that it doesn’t run at rated frequency, and is still faulty.

Building PCs isnt/shouldn’t be that hard and L3 cache is internal to the CPU.

dahlia123 · January 1, 2026, 1:14pm

Can you check the memory module temperature? I have seen the same error from my system before I put dedicate fans on memory modules.

Hako · January 1, 2026, 2:01pm

If I push the memory with Memtest5 on the EXPO profile I can generate the errors it seems, found that out just now.

The temps of the modules rise until around 76C according to HWiNFO64.
Sadly this board is horrible in terms of space with the ARTIC Freezer 4U-M, there is no space left, tha I can see, to put a small fan anywhere near the RAM modules.

With new year just happening and retailers being closed for the rest of the week I can only initiate the RMA next week.

For now I reduced the memory down to something easier and for the short test, in which the EXPO profile would throws errors, it’s not showing any errors.

Crion · January 1, 2026, 3:25pm

BIOS and M/B revision?

Hako · January 1, 2026, 3:28pm

Updated first post with the information.
Rev 1.2
BIOS: FA3e

And welcome to the forum

Crion · January 1, 2026, 7:15pm

Thanks, been lurking but finally made an account. ;D

Hako · January 3, 2026, 10:01am

Update: Sent the CPU back to the reseller, now the waiting game.