AMD Threadripper 3970X under heavy AVX2 load: Defective design? (No, but there is an issue)

I sure hope that is a real AMD rep! This is a serious issue and I hope it will be fixed in future bios or AGESA updates. My entire workload depends on AVX512, and I just plunked down $5000+ on a 3970x build. Haven’t put it together yet, but I’m dreading having it crash or worse, output incorrect calculations.

Are you trolling ? Your whole workload depends on AVX512 but you bought a CPU that doesn’t support AVX512 , never did and never advertised it did.

1 Like

Sorry I meant the smaller AVX, I believe 256?

I’m not really sure about the technical naming here, I just know my software uses AVX.

I’ve tried disabling Spread Spectrum in my BIOS (it was set to Default, whatever that means) but it didn’t help. Prime95 with min/max FFT sizes set to 16K still fails instantly (within a fraction of a second).

A quick LinkedIn search reveals that AMD’s director of corporate communications is called Andrew Prairie, usually referred to as Drew.
The mail address in his post also seems pretty legit, seeing as it is listed in several official AMD documents.

2 Likes

He’s legit. We got in touch with him and AMD is currently trying to repro the problem on their end. Keeping you posted.

7 Likes

Yes confirmed.

1 Like

You mean AVX2?

If i want to test this under linux (epyc), which program and settings should i use? MPrime? Settings?

I did option 16 then option 2 for most of my testing. On all available threads. Mprime from the cli is fine but I tested windows and Linux

I also ran the test on a 3970x with the AORUS EXTREME.
PBO is enabled CPU is getting very hot (95 °C) but no crash.
The CPU is watercooled with a 360 rad.
RAM is running at 3733 in sync with the infinity fabric.

2 Likes

What’s your power draw? A 360mm rad might not be good enough or your cold plate might not be making a good enough contact.

Ran for 15 mins on my 3970X. PBO enabled. Zenith II Extreme. No crash.

3 Likes

The coldplate contact should be good the load is quiet extreme.
Idle from the Wall is between 86W and 100W with the Test Running its 725W.
Thats much higher than every other load I testet so far. 2 2160p encodings in parallel with SVT-HEVC are not that demanding but the CPU is also getting over 80°C with 22°C room temperature but thats with 4,2Ghz on all cores.
I think thats maybe a bit too much load for a single 360.
Normal stuff like compiling gets the CPU to 70°C max.

Yeah. I was at 94C with 3x 360 rads lol. There’s just so much heat coming from this workload. Hard to overcome it even with water.

1 Like

” “A radiator” isn’t a very good metric, as radiators range in size from 1x80mm to 24x140mm.

Going by Skinneelabs’ tests (source), a 360 rad (3x120mm) can dissipate 650W with 3400rpm fans.

That’s too noisy for most people: How much radiator you need depends on how quiet you want it, as well as how much cooling you want. You can get less noise by sacrificing cooling performance. This graph is for 10°C air-water delta, which Skineelabs describes as “average good performance”. Hotter water will dissipate more watts - twice the air-water delta, twice the watts - but the hardware temps will rise correspondingly.

Say I want lower noise for the same cooling performance: Following the Skinnelabs graph down to acceptable noise levels - e.g. 1400rpm - we have around 400W dissipated at 1400rpm for a 3x120mm radiator, or 133W per 120mm rad/fan slot. So at 1400rpm, 650 watts (of actual power) would need about 5x120mm of radiator.

If we go by the “150W per 120mm rad” rule of thumb instead, just because it seems to be the most commonly accepted, a 480 (4x120mm) or 420 (3x140mm) rad should be adequate, and those are sold as “one radiator”.

It’s a tradeoff: You can get by with very little rad if you accept the combination of poor cooling and exceedingly noisy fans. Conversely, you need a lot of rad if you want great cooling and low noise at the same time. It’s your performance/noise/space tradeoff to make, but the “150W per 120mm rad” rule of thumb is a decent starting point, adjust to taste.

Footnote: The TDP rarely matches actual power consumption, and it’s the actual power that needs cooling. Actual power consumption is normally less than the TDP even under heavy load, but overclocking/overvolting can push it quite a bit above the TDP. I would stick with TDP for simplicity. With CPU and GPUs in the same loop there’s some additional leeway in that it’s very rare to load everything to the max at the same time.”

It’s a single 360 radiator(push) with only the CPU in the loop. I thought about adding a 280 but it would be only for prime95 which I’m not comfortable running a long time. May be I should just set a lower °C limit and be done with it.

ASUS ROG Strix TRX40-E Gaming, memory clock/fabric clock at 1600mhz, other settings default, Prime95 v29.8.

I’m not seeing any failures after running the torture test for about 20 minutes. The CPU is sitting at power limit (280w) and temps are ~80c at ~3.7ghz. The CPU boosts to about 3.9ghz, but temps go up to ~85c and it clocks back down to 3.7 - this is with a Noctua U14S TR4-SP3 air cooler. I could probably get better boost with a custom loop.

1 Like

Hi,
The poster with the 360 rads, did mention they had 3 of them in their loop.
You might want to factor density into your calculations?
The person has one at 30mm deep, one at 45mm and one at 60mm deep.
If a rad is twice as thick, it might have more of the thermal buffer, due to more metal, and has twice the fin area (density might differ) for cooling?

Just a thought

I should add that that 94C was w/ PBO. 3.9 all core pinned.
With PBO off, it’s 66C max after 20 mins. Core around 3.6-3.7GHz.