BAD Performance = TR PRO 5955WX + ASRock WRX80 Creator + 2 DIMMs ECC RAM

The performace of my PC is very bad. I’m using ASRock WRX80 Creator + AMD Ryzen Threadripper PRO 5955WX + 2 Slots 64GB 3200 ECC Memory + RTX 3060 Ti GPU + 2TB M.2 NVME SSD. Many benchmarks show, that I’m way behind expected performance (between 40%-60% worse performance than expected on this kind of machine). Is it all because of the RAM? I wanted to first buy 2 Memory DIMMs (because its cheaper) and later upgrade to 4, 6 and 8 kits. But I never thought this would have such a huge impact. I thougt about 10-20% worse performance, but not about 40%-60% worse performance. Could that be related to RAM or could there be any other reason?

in short, yes.

this is an 8 channel CPU and you are populating 1/4 the channels. you are doing good if you are getting 50% expected performance.

there are things you can do and workloads you can test with that can show better performance, but it will suffer.

you can go to a 4 channel setup and get around 80% expected performance depending on the workload, but ThreadRipper and Epyc really expect a full RAM load out unless you are in a very specific use case.

Pretty much what he said

You can also enable PBO and set +200mhz freq override to boost higher and a negative curve to also boost higher on all core

I did -30 on 5975wx but your stability mileage may vary

It probably won’t help if you’re that gimped on memory bandwidth

I ran a 7742, and now run a 7713p in an AsRock Rack ROMED6U-2L2T, and so have a lot of experience with this. I’ve never tried two DIMMs, but you can achieve full performance with 4 DIMMs (on Rome) or 6 DIMMs (on Milan). Very few workloads are entirely memory bound.

The myth of losing X% of performance by not populating all channels is not borne out in my testing and extensive usage, because my workload (rendering) runs in caches. Moving from four to six DIMMs with my 7713p did not increase performance — literally 0% improvement.

Something to remember is that certain benchmarks (PassMark comes to mind) score you on memory bandwidth, regardless of whether your core performance is lower. Without a better understanding of your workload or the benchmarks, I recommend populating two more channels (for four total) so you get the full benefit of interleaving.

2 Likes

It has a lot to do with the workload and chiplet layout, that is correct. I run a few epyc systems and a threadripper desktop. The threadripper desktop with windows installed seemed the most noticeable performance difference between 4 channel and 8 channel. The server work loads with NUMA and all the server features could be tuned. But windows on the threadripper FELT faster with 8 channel vs 4. The benchmarks i did on the windows install varried wildly from no difference, to significant difference, depending on the workload, so i assume it is just easier for people to try and relate it to a generic x% as apposed to going into a writeup about performance per workload per ram channel.

have you checked the ram stick layout. and have placed your ram in slots c1 and d1?.
if not set them in the correct slot for dual channel operation.

other than bandwidth there should be no negatives associated with running dual quad or octa channel configs.
yes you will see smaller numbers in some benches but in others there will be zero diff and possibly even performance gains depending on cas latency.

Way back when I was choosing between (expensive) 3975WX and (cheap) 3955WX, there was some discussion about why the 3955WX was so much cheaper.

General conclusion was that both the 3975WX and 3995WX had 4 CCD’s (thus a full set of memory controllers), while the 3955WX and 3945WX only had 2 CCD’s enabled… meaning only half the number of memory controllers.

Anyone know if this was the case? If so, and if the 5955WX likewise only has 2 CCD’s enabled, would this explain the big difference in possible bandwidth between 5955WX vs. 5975WX, even when both have access to a full set of 8 DIMMS?

That claim comes from a fundamental misunderstanding of the Zen architecture. On Zen 1, each CCD had two memory controllers. However, in Zen 2, 3, and 4, all memory controllers are located on the IO die.

While the 5955WX and 5975WX are indeed lacking the full compliment of 8 CCDs, all of these processors theretically have access to the IO die’s 8 memory channels. They have the exact same number of “memory controllers.”

However, the way the IO die is laid out, they indeed lack CCDs “local” to some of the memory channels. You can read more about it in the following STH article: AMD EPYC 7002 Rome CPUs with Half Memory Bandwidth

The gist is, as you can see from the above diagram, that a low end SKU with missing CCDs is left with memory channels on “quadrants” of the IO die without local CCDs. This lowers performance (in theory), but not capacity.

It also allows AMD to market the whole line as 8-channel parts, when in reality users may not have access to 8-channel performance. However, I get the impression from the original post that his machine is under-performing compared to identical machines, so this probably isn’t relevant.

3 Likes

actually more relevant than you give credit…
it shows the minimum needed config to feed all the ccd’s equally is 4 dims on the 7742.
correcting my 2 dims should work fine reply earlier.

thanks :slight_smile:

I agree that four DIMMs is one of the more optimal configurations for Rome generation chips like the 7742. You’ll notice in AMD’s memory population guidelines for Rome that populating six slots is not a supported config for interleaving, whereas four slots is supported and will give each IO die quadrant one memory channel. This is a “balanced configuration” in that all quadrants receive a balanced amount of resources. So you want either 4 or 8 for Rome.

I meant that my advice may not be relevant for the original poster, whose question was about his Ryzen Threadripper PRO 5955WX. In regard to this chip, I am not sure about the CCD layout, how this interacts with the motherboard’s memory population order, and so on.

Moreover, Milan brought support for extra interleaving modes (like support for six DIMMs). Luckily for us, AMD publishes extensive guides on this topic. And system integrators like Lenovo do benchmarking for comparison.

1 Like

i went and checked.
the 5955wx seems to have 2 ccd’s so as long as the rams in the right slots.
2 sticks should work fine.

so :confused:

What benchmarks have you done to come to the conclusion that your performance is very bad?
Can you elaborate?

That was my thinking as well, so perhaps the motherboard’s memory load order is not optimized for two-CCD chips. If the memory was on the wrong IO die “quadrants” with one of these two-CCD SKUs, performance would probably be terrible.

@rrubberr - Thanks for the clarification! What I had in mind was “number of direct links to memory channels on the IO die”, not “number of controllers”. I actually had Patrick’s diagrams in mind :grin:

1 Like