Cooling a Threadripper 7000 & DDR5

Was running into some weird performance issues on a TR 7980x build when benchmarked against a prior 5975wx. I mainly use it for data processing in Spark and saw that short jobs (1-5min) had a 20-30% improvement where longer jobs (10+min) had close to a 50% regression in performance. Code compile in Java and TypeScript also had a roughly 30% improvement so most of it seemed like IPC vs increase in cores. Blender and geekbench were in the range of other 7980x builds though.

The main spec difference in the 2 builds was memory. The 5975wx had 512GB of DDR4 3200 while the 7980x had 384GB of DDR5 5600 ( Micron 96GB DDR5-5600 running at 5200). My main theory for the Spark performance regression was the larger linux file cache in the 5975wx and the lower memory per core in the 7980x causing increased GC. However, CPU temps also looked curious as they reached 87C when many reviews mentioned the better thermals in the 7980x vs 7970x.

Ended up doing some profiling in windows since memory temps aren’t available in kernel 6.5 and saw the top most DIMM was actually thermal throttling. It would reach 80c in 2min of y-cruncher and read/write throughput would throttle from 60-70GB/s to half. Surprised it was the top DIMMs which were close to the top exhaust vs the more buried lower 2.

Ended up printing some shrouds to direct the rear fan to blow air over the DIMMs.

This kept everything below 78C (after the paper shroud) and prevented thermal throttling for the entire ycruncher run. Also played around with some heatspreaders, but given the large PMIC on the DIMMs, could only go with the cheap aluminum clip-on ones. These slowed the ramp up in temp, but resulted it in slightly higher overall temps so ended up not using them.

Printed up some shrouds for the CPU exhaust which gave about 5C improvement.

Re-benchmarking spark unfortunately showed no major change, but comparing y-cruncher results showed that memory bandwidth is likely not the issue (16min for 5975wx and 10min for 7980x for 50B digits).

8 Likes

Wow. This is something that needs to be on thinggyverse :yum:

1 Like

Possibly a software or configuration issue? How do your scores match up to these

And also these

I’ve got workloads that will hit the memory hard enough that thermal throttling or stability issues arise even with decent airflow, your not alone.

I ran a couple and they were as expected. I don’t think those benchmarks are that realistic as they finish pretty fast (seconds vs minutes or hours) given what spark is most commonly used for and the synthetic test data is pretty skinny. The dataset I was benchmarking with is roughly 400GB in zstd. Probably something nuanced.