How to maximize Threadripper Pro 3995WX memory bandwidth?

I am looking to build a Threadripper Pro 3995WX system but total memory bandwidth is of extreme importance. Much more so than total memory or ECC functionality. Memory bandwidth will literally affect dollar return percentage on investment. I have been researching the best memory modules and motherboard options, but the available motherboards are fairly vague on what functionality and compatibility they offer with memory modules.

Ideally, I would like to use custom memory frequencies and timings, but from what I understand that is not supported on the platform in general. So I am left wondering if I am limited to SPD ratings of consumer grade kits and if so, will the consumer grade kits even run at their rated latencies if they are on XMP profiles and not SPD profiles? The other consideration is that most 3200Mhz rated consumer kits run their advertised latencies at 1.35V when the standard SPD spec is 1.2V. So my concern would be would the system even boot if the motherboard does not even support voltage increases to the memory modules?

For motherboards, the Supermicro M12SWA-TF interests me greatly because of the cost savings, however if the Asus Pro WS WRX80E-SAGE SE offers better memory setting, than it is the obvious choice. I just have no idea what these motherboards offer in terms of available adjustments to memory setting.

Eight memory channels with the fastest RAM clock you can find. And if you really meant what you said, then CL19 or even CL22 should be no problem for you because you care more about bandwidth than latency.

Yes, total bandwidth is most important. However, 3200Mhz is the cap as far as I am aware based on what the platform allows you to adjust. I was already looking for 3200Mhz and populating all 8 channels as that is the easiest part of the equation. The less clear part is what else can be done beyond that.

What performance on what is are you chasing? And I’ll test. I have dual rank 64gb right now. 512gb


The specific application is RandomX. What I am looking for is information on 1. whether any kind of memory tuning is possible and 2. What are the fastest configurations I can achieve with memory module types if tuning is not possible.

RandomX is equally benefited by lower latency and higher frequency, so the biggest question is how to tune the memory. This is why I am most interested in what options the different motherboards actually have as it will determine what memory and motherboard should be used to achieve maximal effect.

Well, you can use the fastest Ram with eight memory channels. While we certainly weren’t accustomed to seeing Intel so utterly trounced in a segment it has traditionally ruled with pricing impunity, the Threadripper 3000 processors did fall short in one area – they didn’t enable all eight memory channels.
That will also boost up the performance in terms of speed.

The cpu supports udimm as well as lrdimm and rdimm… so why not just grab 8 dimms of 4266 trident royal z and give it a whirl??

That’s exactly what I’m looking for. I need to know if it can support memory frequency tuning like that . If not, 2 separate 3970X machines are actually the better deal. I am currently still in infrastructure build out building the racks and cooling solutions. So I still have time to make this decision.

not so fast… supports… at non xmp speeds. so you’re stuck at whatever jedec speed the particular kit does.

If that is the case, then I may be better off tuning the 3970X with dual boxes instead of the 3995WX. I just have to watch the wattage usage.

This is of course assuming that a parity of cores to memory channels will perform better with the 3970X. I have looked at all the specs and per thread the 3970X is the closest to per thread of the 3995WX because of the cache available to the 3970X. The only difference is the memory supported which should be in the 3970X’s favor unless there is something that I am missing.

Build out is going into server racks with direct liquid cooling with quick connects running all 2U cases in parallel pluming. Should make for an interesting build. thinking about vlogging it.

Maybe you and Wendell understand but I am lost in the weeds.

It seems to me that the SPEED of RAM access will be LOTS MORE with double the RAM channels even if it was only 2/3rds of the frequency. I completely fail to see what you gain by using 4 channels at 4,266 instead of 8 at 3,200.

It comes down to cores/threads per channel. In the 3995WX you have 64cores/128threads competing over 8 channels of memory. In the 3970X, you have 32cores/64threads competing over 4 channels. As you can see it is simple division which reaches parity with the 3995WX when the 3970X is multiplied by 2 systems. This is obviously only applicable when you are running an application which benefits from parallel work loads, which randomX definitely does.

The reason for the 3970X over the 3990X is, one the memory channels but also the cache available per core/thread. The 3970X has 128Mb L3 cache and the 3995WX has 256 MB L3 cache. However, the 3990X only has the same 128Mb cache of the 3970X, meaning not only are you losing memory channels per core going with the 3990X, but also cache per core/thread.

XMP isn’t guaranteed to run at that speed, especially those higher speed kits.
It’s probably safer to go with 3200mhz 1.2v JEDEC kits.

