Poor performance on Asus WRX80e, with Ryzen Threadripper Pro 5995wx

So I recently completed my first PC build, its basically the Ryzen Threadripper Pro 5995WX on a WS80E Sage SE WiFi motherboard running Ubuntu 20.04


I built this so I could compile rust code faster. Specifically this repo:github.com/paritytech/polkadot-sdk

But for some reason compiling this repo is 10 times worse on my PC than on my Macbook M1 pro 64GB.

cargo check --all --benches takes like 5mins on my Mac and 50mins on my PC. Beyond frustrated at this point. Do I need a new motherboard? Seems the Sage Se II has PBO which should help with overclocking this CPU right? Or is there anything I’m doing wrong with my set up? I’ve tried both acpi-cpufreq and amd-pstate drivers nothing is even close to my Macbook’s performance.

I see that you only have 4 DIMM slots populated in a mobo with 8-channel memory architecture. This will have a significant impact on the overall performance.

Also, I’d look at the storage setup. It looks like you are using m.2 drives, likely consumer grade.

It’s going to take a lot of IO to feed those 64 puppies in your 5995wx.

So I had a feeling about the SSD, it seems during my benchmarks SSD speeds are throttled all the way down to 85MB/s (jfc).

I’m using the Samsung 990 pro 2TB without heatsink. The motherboard has a heatsink, and SSD temperature never crosses 50c . So not sure why it’s throttling here.

What kind of disk activity does that bench consists of?

85 MB/s is spot on for low queue 4k random read limit on samsung 990 PRO.

If your test does not need much space, mount temporary work directory as memory backed tmpfs and rerun them.

Good thing with this plafrom is, you can remove bottleneck easily.
Your board should be trivially compatible with ASUS Hyper M.2 x16 Gen 4 AIC,

Also some light reading to brighten your day:

EDIT: I just noticed your memory physical layout and its kinda weird looking. I have never seen asymetrical population pattern for working multichannel operation.

Are you sure you are actually running in quad channel mode? Motherboard manual has no guidance what is proper slot population priority at all, so thats weird.

As pointed out in the comments above, it is recommended to populate all 8 DIMMs. When using fewer than 8 DIMMs, you can try the following configuration recommended by AMD:

The manual is distributed by Gigabyte for the Epyc, but it is made by AMD and can be applied to the ASUS WRX80e and Threadripper as well. The four-channel configuration is explained on the 9th page.

3 Likes

So i tried the memory backed tmpfs and my benchmarks are now 2.5 times faster than my M1 Pro macbook. It’s clear that my SSD is throttling for whatever reason.

My next step will be to try run my benchmarks on windows to check the for throttling, this way i can eliminate the possibility of driver issues that might be the problem on ubuntu. Hopefully i don’t have to replace my SSD drives.

2 Likes

Correct, but it looks like he has one DIMM in the upper bank and three in the lower. Am sure that isn’t recommended.

Per Dahlias source:

  • For best performance, AMD recommends populating all eight memory channels per socket,
    with every channel having the same capacity

  • If a customer chooses to populate only four channels in the system, AMD recommends limiting the processors in that system to those with 128MB or less of L3 and populating channels CDGH identically. This will enable four-way interleaving, which will generally provide the best performance with a four DIMM population

I am pretty sure OPs config does is no correctly populated. It seems like its either BCD F or C GHF.
Either way this physical configuration will either run in inefficient quad channel mode, or more likely single channel config.

Memtest86 diag might be useful for quick testing.

EDIT> its ABC F

5 Likes

Sorry I can’t give you more than one like. Excellent post. Thank you.

Since you’re testing differenct archs, OS etc I think you should probably try to compare your results with something utilizing the same arch at least before reaching a conclusion that something is wrong.

I don’t have a ability to port polkadot-sdk but I did try something that’s quite large, sccache 0.6.4 using Rust 1.73.0

With the following settings:

CARGO_PROFILE_RELEASE_LTO="true"
CARGO_PROFILE_RELEASE_PANIC="abort"
CARGO_PROFILE_RELEASE_CODEGEN_UNITS=1

Ryzen 7900 (bare metal)
2m5.67s real            6m55.76s user           13.63s sys

Neoverse N1 (4x @ 3GHz) VM
7m13.96s real           16m19.09s user          1m0.73s sys

I believe this^ to be incorrect, it should be one module per memory group.

Edit;
Please see my new post a couple of posts down with the latest info I have found (source Gigabyte Support eTicket)

2 Likes

Weird, your recomended 4 dimm config differs from gigabyte guidance.

GIGABYTE: CD + HG ( same for board with 16 dimm slots)
your source: AC + EG

Maybe gigabyte is older guidance that assumes half of the chiplets on cpu (ie up to 32 core config)? That would explain the asymetry. Testing is required.

Either way, OP has ABC F socketed, which is likely wrong config when taking in account either guidance. Reshuffling and testing configs via memtest is probably easiest fix on OPs side.

EDIT: there is older thread about this board and memory issue here on l1techs too. Good reference point.

3 Likes

PS: In that thread, the screenshot in this post has the memory running at 115 degrees :thinking:

The latest info in this saga (that shouldn’t be so poorly documented) is from my Gigabyte Threadripper Pro motherboard that has the same memory descriptors on the motherboard layout as the ASUS one.
The word from Gigabyte Support to my email was to do it sequentially as per the way they are denoted in the manual, which is the same as the ASUS WRX80e (found here; https://dlcdnets.asus.com/pub/ASUS/mb/SocketTRX4/Pro_WS_WRX80E-SAGE_SE_WIFI/E19401_Pro_WS_WRX80E-SAGE_SE_WIFI_UM_V2_WEB.pdf?model=Pro%20WS%20WRX80E-SAGE%20SE%20WIFI )

So that means put them A1,B1,C1,D1 for a quad mode, which means all the slots on the lefthand side of the CPU socket for Quad Mode when looking at the front edge of the board or the RAM Slot under the CPU socket when in a tower config.

HTH.