Hi all. Longtime reader, first time poster. I could use some guidance about the real-world consequences of memory speed. I’m building a PC for content creation, not gaming, and am leaning towards a 9950X3D (or 9950X) on a Asus X870E Proart Creator with an RTX5080.
My work varies pretty widely - Photoshop; motion graphics on After Effects; 4K-8K editing with PremierePro; CPU/GPU rendering on Cinema4d; & high-rez RAW image processing for photogrammetry on Reality Capture. All workflows that benefit from a lot of RAM.
I initially thought I’d get a 4x48GB kit (CORSAIR Vengeance 192GB DDR5 5200 - CMK192GX5M4B5200C38) but am concerned about the platform throttling to DDR5-3600 with 4 banks populated.
One alternative I found is a Crucial 2x64GB kit (128GB DDR5-5600 PC5-44800 CL46 CT2K64G56C46U5).
I guess the gist of my question is if the apps I work with will be noticeably slower running at 3600 compared to 5600? Are there any other factors I’m not taking into consideration?
Puged has some decent benchmarks for your type of workload where you can see the impact of the 9950 vs 9950x3d. Their benchmarks
It’s somewhat safe to extrapolate from these. The workloads where you see a big delta due to the cache means those are the one hitting main memory a lot. Those are also the tests where using the slower memory will hurt.
The question becomes, how important is 10-20% of performance?
Are your workloads even highly memory bandwidth dependent?
IN my experience, outside of gaming, and things like that, memory speed haven’t had a huge impact on the things I do.
All of that said, pay attention to if the larger memory modules you get are single or dual rank. You can think of a dual rank memory module is essentially two RAM sticks in one.
Two single rank modules = traditionally best performance (but it depends*)
Four single rank modules = traditionally worse performance (but it depends*)
Two dual rank modules = essentially the same thing as four single rank modules.
*If your system can maintain the same bandwidth settings, it really depends on workload, but it is not always entirely straightforward which workloads will be hurt (slightly) from multiple sticks per channel, and which can benefit (slightly) from multiple sticks per channel.
Benefits include that your address range for memory goes up to a full 64 bit per channel (from 32bit). Drawbacks involve needing to interleave between them, and limiting bandwidth on the channel. Some productivity loads like two sticks per channel better, some perform worse. Some games like two sticks per channel, some perform worse. You really have to benchmark your own workload.
Reducing clocks when multiple sticks are installed - as you have mentioned - has traditionally been an issue with multiple sticks per channel.
AMD specifies the the 9950x3d RAM capabilities as follows:
2x1R - DDR5-5600
2x2R - DDR5-5600
4x1R - DDR5-3600
4x2R - DDR5-3600
As with all things there is some room for tweaking here. It is notable that some people run even DDR5-8500 on their AM5 systems, and while it is not supported, it works.
Just because AMD says the max supported speed is DDR5-3600 with four single rank sticks, does not mean that setting them at their rated speeds won’t work, BUT it isn’t going to be guaranteed.
I have also read that non-binary* DDR5 RAM sizes (24, 48, 96, etc.) put more stress on the memory controller than traditional binary sizes (16,32, 64, etc.) so I would try to stick to the binary sizes if possible.
*(no pun intended, please don’t drag culture wars into this, I just can’t think of a better term for this right now)
Another thing to keep in mind is that AMD’s RAM is a little more complicated due to their chiplet design, with a separate IO die and the infinity fabric that ties it all together.
There are three different clocks that you should be keeping in mind: - MCLK: This is the clock of the RAM itself. It is half the DDR5-xxxx figure (due to DDR). So a DDR5-6000 stick is 3000Mhz. - UCLK This is the “uncore” clock. The memory controller will be running at this speed. - FCLK This is the clock of the fabric that ties everything together, which can be another bottleneck.
AMD used to recommend running these three 1:1:1, at the same speed, but that changed with AM5. The recommendation is now to leave FCLK at “Auto” (which usually results in a clock of 2000-2100 Mhz depending on the motherboard, and how lucky you were with the silicon lottery on the CPU)
They then recommend that you get the best performance if you run UCLK and MCLK at 1:1 (the same clock)
There is no benefit from running UCLK faster than your MCLK, but if you can’t meet it 1:1, you generally have to drop it down to an even divider of 1:2, and then UCLK runs at half your RAM speed which has performance implications.
UCLK will usually hit 3000mhz stably on all AM5 CPU’s, which is why DDR5-6000 is the most popular RAM speed for AM5 CPU’s, as it hits that 1:1.
If you are lucky, on your CPU UCLK can be stable up to 3100Mhz, and then the best RAM to get is likely DDR5-6200, but the problem is there is no way of knowing without testing it.
All of that said, there can be performance benefits from running RAM faster than UCLK. For instance, a very fast clocked RAM stick can do operations (such as clears) that don’t use bandwith and instead rely on waiting for clocks to go by) faster. The problem is the fact that there is nothing between 1:1 and 1:2 on the UCLK to MCLK ratio.
So, the implication here is, if you go out and buy the highest clocked DDR5 RAM with EXPO there is right now (which if you are looking for EXPO profiles, which you should for RAM on AMD appears to be DDR5-8000), the following happens.
Your DDR5-8000 RAM will have an MCLK of 4000Mhz, but your UCLK won’t be able to hit 4000Mhz, so it will have to run at 1:2, which is 2000Mhz. This will bottleneck your memory bandwidth.
There are still some workloads (with many clear operations) that might benefit in a setting like this, but most workloads will benefit from higher UCLK bandwidth instead.
Now, if DDR5-12000 RAM were a thing, that would probably be amazing on an AM5 system (if you could get it to run) as you’d be able to sit at max guaranteed UCLK of 3000 Mhz, and have your MCLK at double that, and benefit from more clocks passing faster for things that involve waiting for clocks to pass.
Anyway, that is neither here nor there, as this RAM will likely never exist, and who knows if those clocks could even be supported if it did.
But that is the theory of it all.
Try to keep your CAS latency (and other sub timings) low
Try to keep your clocks high, but not too high so you get pushed into 1:2
Try to use single rank sticks if you can.
Try to use two sticks if you can rather than 4.
Try to avoid non-binary RAM size numbers. (24, 48, 96, etc.)
Here is what I would do in your situation:
I’d buy a kit that meets your needs (though I’d probably try to avoid the non-binary sizes) from a retailer with an excellent return policy (like Amazon) and test them.
I’d likely go for DDR5-6000, and try to get a lower CAS latency (I’ve seen them as low as CL24, but CL26 and CL28 are the most commonly available lower timing sticks. (The CAS Latencies of your two examples are really high! But maybe that is due to the limited availability of 64GB sticks…)
When considering latency, keep in mind it is specified on clock cycles, so - as an example - a DDR5-6000 stick with CL30 is going to be lower latency than a DDR5-5600 stick at CL30, because those 30 clocks go by faster at 6000 MT/s than at 5600 MT/s
For comparison purposes, Techpowerup has a decent calculator that can help understand the true latencies for each as measured in ns:
Pop them in, configure them to run at their rated speeds, and if it doesn’t work, return them and buy something else. It is very possible you can get four sticks to run at full rated speeds, but your chances are higher if you stick with two single channel sticks.
And this is where the “non-binary is harder on the memory controller” bit comes in. You can get away with running RAM outside of the specs AMD lists on their website, but if you use configurations that load the controller harder, this is less and less likely to work right.
I would test your ram with Passmarks Memtest If it passes a full run of this (which will take quite a while) you are pretty much guaranteed it is stable and works well.
But then it comes back to the whole “will you notice the difference” question.
If you were planning on playing games, especially if you were a competitive super high framerate type of player, I’d say yes, this will make a difference. For most productivity/workstation type of workloads RAM bandwidth makes much less of a difference.
There are exceptions though, especially when it come to some scientific and other highly compute intensive workloads that work with large datasets that fit in RAM.
I think if you were running benchmarks on your photo/video type of workloads, you’d notice a measurable performance improvement in your workloads from higher RAM speeds, but I highly doubt it will be a highly measurable difference to you in your day to day work.
Hope this helps.
Edit:
It looks like maybe I was a little bit too optimistic when it comes to the chances of achieving DDR5-6000 with four sticks, but you might find this thread interesting:
A few folks in there have gotten their four sticks stable at 5600. 4800 is also prevalent. I guess the takeaway here is “definitely higher than AMD’s speced 3600” but maybe not quite up there at 6000.
One guy is even running 4x64GB at 4800. (he used DDR5-6400 sticks that he underclocked in order to achieve this) He said 4800 is where he stopped testing, so he may even have been able to go higher.
More specifically, out of the box DDR5-5200 2DPC 2R is fairly rare. Being able to manually tune it in is also rare. I’ve gotten 4x48 5200 to go several minutes before first error but haven’t had a IMC toss in the silicon lottery where it’s been possible to stabilize. So far my AM5 luck’s been 4x48 overclocks stable at 4000, 4400, and 4800. Though I haven’t had time to try to coax the 4400 higher yet.
Specifically, one of those builds is CMK192GX5M4B5200C38 and my AM5 experience with it is applying the XMP profile bricks post. Flashback required for recovery.
Lower latency than Arrow’s IO die and Ryzen cores are more bandwidth efficient than anything Intel’s done lately. ¯\_(ツ)_/¯
All the boards I’ve worked with auto FCLK to the support bound of 2000 MHz from DDR5-4000 up. So, outside of overclocking, should be just DDR5-3600 where you’d want to bench with FCLK at 1800 and 2000. Unless BIOS autos into an overclock, which I’d argue is a defect.
There’s a few similar issues with the details of your descriptions of clock tradeoffs. TL;DR different workloads have different responses to different bandwidth and latency configs but I don’t know of any published memory scaling data which directly addresses @SeanM’s questions. TechPowerUp did include After Effects, Premier Pro, and Zephyr (a different photogrammetry) in their AM5 DDR scaling benches. The first two like bandwidth and Zephyr likes lower latency.
I can also say that, for some of the in house number crunching benchmarks I run, DDR5 scaling is nonmonotonic depending on compute kernel width.
Just to make this more complicated, if you’re not tight for budget the preferred comparison right now is probably two 2x48 M-die kits (e.g. CMK96GX5M2B6000Z30) versus 2x64 B-die (CT2K64G56C46U5) rather than 4x48 B-die (CMK192GX5M4B5200C38).
2x2x48 M-die helps offset the downclock by reducing latency and there’s not enough data to tell if it affects the odds of hitting higher clocks. I’ve got DDR5-4000 24-26-26-38-48-64, for example, which isn’t terrible as the theoretical rating bound for the kits I’m using 22-26-26-36-42-62 (IIRC) and I haven’t tried CL23 or throwing voltage at them yet.
No. If definitely higher than 3600 was a thing then AMD’s 2DPC 2R spec would be higher to compete with Arrow’s DDR5-4400. Realistically, I’d say plan on 3600 and if you can make the +33% overclock to 4800 that’s a pretty nice bonus. The +67% to 6000 seems to take a golden combo.
With the 4000 build I have if you don’t install the 2x2x48s, which are the same kit and have identical EXPO profiles down to the terminations, in a certain order 3600 is what you get. Try to train 3800 or 4000 and the BIOS falls back to 3600. 4200+ is a giant fucking wall of nope. BIOS red lights out and doesn’t even try to train. DIMM arrangement, terminations, voltages, how slack the timings are, nothing I’ve tried makes a difference. Flashback required on every attempt.
I can also kill the by tightening a few of the timings two clocks. Some of those might boot and pass 10 minutes of memory stress or they might brick post. So that’s fun.
I don’t know of evidence to support this. The dataset’s so small I can’t point to a difference in success between 16 Gb A-die and 24 Gb M-die. Memory overclockers don’t usually buy B-die, so even less data there.
If there is an effect the most likely cause I can point to is DIMM layout differences due to changes in DRAM package shape, in which case there’s little reason to expect the claimed behavior to hold across 16, 24, and 32 Gb.
Naturaly, it is all very load dependent. But, in general, by going from two sticks per channel to one stick per channel on the AM5 platform, you will see most gains in multithreaded workloads.
Not super big gains, I would say 7% at the most.
Single-threaded workload gains would be barely noticeable.
As mentioned, MHz gains, while beneficial overall, do get upset by looser timings.