I built this so I could compile rust code faster. Specifically this repo:github.com/paritytech/polkadot-sdk
But for some reason compiling this repo is 10 times worse on my PC than on my Macbook M1 pro 64GB.
cargo check --all --benches takes like 5mins on my Mac and 50mins on my PC. Beyond frustrated at this point. Do I need a new motherboard? Seems the Sage Se II has PBO which should help with overclocking this CPU right? Or is there anything I’m doing wrong with my set up? I’ve tried both acpi-cpufreq and amd-pstate drivers nothing is even close to my Macbook’s performance.
I see that you only have 4 DIMM slots populated in a mobo with 8-channel memory architecture. This will have a significant impact on the overall performance.
Also, I’d look at the storage setup. It looks like you are using m.2 drives, likely consumer grade.
It’s going to take a lot of IO to feed those 64 puppies in your 5995wx.
So I had a feeling about the SSD, it seems during my benchmarks SSD speeds are throttled all the way down to 85MB/s (jfc).
I’m using the Samsung 990 pro 2TB without heatsink. The motherboard has a heatsink, and SSD temperature never crosses 50c . So not sure why it’s throttling here.
What kind of disk activity does that bench consists of?
85 MB/s is spot on for low queue 4k random read limit on samsung 990 PRO.
If your test does not need much space, mount temporary work directory as memory backed tmpfs and rerun them.
Good thing with this plafrom is, you can remove bottleneck easily.
Your board should be trivially compatible with ASUS Hyper M.2 x16 Gen 4 AIC,
Also some light reading to brighten your day:
EDIT: I just noticed your memory physical layout and its kinda weird looking. I have never seen asymetrical population pattern for working multichannel operation.
Are you sure you are actually running in quad channel mode? Motherboard manual has no guidance what is proper slot population priority at all, so thats weird.
As pointed out in the comments above, it is recommended to populate all 8 DIMMs. When using fewer than 8 DIMMs, you can try the following configuration recommended by AMD:
The manual is distributed by Gigabyte for the Epyc, but it is made by AMD and can be applied to the ASUS WRX80e and Threadripper as well. The four-channel configuration is explained on the 9th page.
So i tried the memory backed tmpfs and my benchmarks are now 2.5 times faster than my M1 Pro macbook. It’s clear that my SSD is throttling for whatever reason.
My next step will be to try run my benchmarks on windows to check the for throttling, this way i can eliminate the possibility of driver issues that might be the problem on ubuntu. Hopefully i don’t have to replace my SSD drives.
For best performance, AMD recommends populating all eight memory channels per socket,
with every channel having the same capacity
If a customer chooses to populate only four channels in the system, AMD recommends limiting the processors in that system to those with 128MB or less of L3 and populating channels CDGH identically. This will enable four-way interleaving, which will generally provide the best performance with a four DIMM population
I am pretty sure OPs config does is no correctly populated. It seems like its either BCD F or C GHF.
Either way this physical configuration will either run in inefficient quad channel mode, or more likely single channel config.
Since you’re testing differenct archs, OS etc I think you should probably try to compare your results with something utilizing the same arch at least before reaching a conclusion that something is wrong.
I don’t have a ability to port polkadot-sdk but I did try something that’s quite large, sccache 0.6.4 using Rust 1.73.0
With the following settings:
CARGO_PROFILE_RELEASE_LTO="true"
CARGO_PROFILE_RELEASE_PANIC="abort"
CARGO_PROFILE_RELEASE_CODEGEN_UNITS=1
Ryzen 7900 (bare metal)
2m5.67s real 6m55.76s user 13.63s sys
Neoverse N1 (4x @ 3GHz) VM
7m13.96s real 16m19.09s user 1m0.73s sys
Weird, your recomended 4 dimm config differs from gigabyte guidance.
GIGABYTE: CD + HG ( same for board with 16 dimm slots)
your source: AC + EG
Maybe gigabyte is older guidance that assumes half of the chiplets on cpu (ie up to 32 core config)? That would explain the asymetry. Testing is required.
Either way, OP has ABC F socketed, which is likely wrong config when taking in account either guidance. Reshuffling and testing configs via memtest is probably easiest fix on OPs side.
EDIT: there is older thread about this board and memory issue here on l1techs too. Good reference point.
So that means put them A1,B1,C1,D1 for a quad mode, which means all the slots on the lefthand side of the CPU socket for Quad Mode when looking at the front edge of the board or the RAM Slot under the CPU socket when in a tower config.