Maximizing DRAM performance

On an 8 DRAM channel WRX90 system with a Threadripper PRO what memory configuration would yield the highest performance (with cost not being a factor and higher capacity not the goal) ?
8 single rank UDIMMs ?
4 double rank UDIMMs ?
8 double rank UDIMMs ?
Would using RDIMMs instead of UDIMMs yield better performance ?

1 Like

Modern threadripper cannot use UDIMMs. That being said you’ll be able to achieve the highest frequencies with the 24 gigabit M-die RDIMMs in 1Rx8 modules.
There are some workloads that will get slightly higher bandwidth out of dual rank memory however.

If maximum DRAM performance is required, Intel’s more server-y platforms are achieving about double the bandwidth AMD can muster on their best server platforms because intel adopted multiplexed DRAM. Micron is is already shipping 8,800MT/s MRDIMMs and 12,800MT/s are being sampling if you are someone special.

3 Likes

this shit keeps me up at night

but my understanding is the DIMM is speed rated as a whole, including both ranks

the new multiplex DIMMs switch between modules on DIMM to maximize speed

Thanks for your input.
You mean MRDIMMS ?

The problem is that (last time I checked) Xeon has lower per core performance than the Threadripper.

It all starts with the workload then testing.

I mean if the work load is in x3d cache. memory speed is a nothing burger,

With some luck Phoronix has benchmarks for your application.

But MRDIMM are not generally available yet…

1 Like

Yes, they are MRDIMMs for now… until JEDEC inevitably renames it later this year.

It depends on the workload, in the Phoronix article @quilt shared, Intel’s best is performing double the speed of the best 3d v-cache EPYC CPUs AMD has to offer in the AMG workload, this works out to Intel having more performance per thread than AMD in this workload. There are definitely specific workloads where AMD pulls ahead too (although not as many as there use to be now that Granite Rapids is a thing).

AMD is rumored to get MRDIMM support for their Zen 6 EPYCs next year.

My take on 3D V-cache is that if the workload is small enough to fit into it, the workload falls on the wrong side of Amdahl’s law. There are some exceptions to this though, very simple/small routines that are repetitive and parallelizable do seem to benefit from 3D V-cache, but these types of workloads are prime targets to port to GPU or other accelerator.

us normies can buy MRDIMMs now (“MFG Drop Ship” is normal for this site, I just received some SM motherboards last week that were ordered at this status):

2 Likes