If only I could switch to 5800mt/s memory without buying a new motherboard and processor in order to run it…
It’s definitely not a downclock, it’s just old. The 4x16 sticks I’m running are only rated for 3200, so 3800 is nearly a 20% bump already, and is near the outer reaches of what I’ve ever managed to stabilize on AM4.
Cost/benefit it’s still doing okay though; tallying the costs puts it on relatively similar raw costs to building the “same again” on AM5 (cheaper CPUs but pricier boards and RAM), but, well, it’s had 26000 hours of a head start.
2*Xeon Platinium 8280 (56c), 12 DIMM, no HT, no Virt.
Bench / W10 / Ubuntu 24.04
CFD only / 44 m 39 s / 36 min 05 s
EM only / 27 m 34 s / 33 min 54 s
Both / 5h 44 min 40 s / 5h 17 min 18 s
Solid boost (10-20%) with Ubuntu despite the strange result on the EM only simulation.
Despite that the user experience is not so good with Ubuntu 24.04 LTS on this Dell Precision T7920 (lots of tiny bug). Might use debian instead.
That’s a really odd result for the EM only simulation on Ubuntu; if Ubuntu is notably faster than Windows in the more CPU compute bound simulations (CFD only and CFD-EM), I would have expected it to be better in the more memory bound simulation (EM only). Makes me wonder if other distro’s would experience this discrepancy.
The 10-20% performance uplift with Linux over Windows is inline with my experiences as well.
The 9950x result for CFD-only 10 GB is here. I don’t see any improvement over 7950x numbers shown above in this thread (unless Linux will show noticeable performance boost). Does this benchmark AVX-512 aware?
Shitdows 11 23H2 with zen5 branch prediction fix,
MSI tomahawk x670e,
Kingston [KF560C32RS-48]x2 running at XMP 6000 CL32,
ARCTIC liquid freezer III 280mm AIO.
I run at stock settings, all unnecessary background aps disabled, with high priority to COMSOL executable. (This allowed me to get +300 points in Cinebench R23). I’ll run the multiphysics model later, then I will need to spend half a day outdoors.
This is very likely the case. @northstrider benchmarked a 7950x at 40m44s on Windows 11 a ways up in the thread (all other 7950x’s in the thread were benchmarked on Linux); I’m betting Linux would show a ~20% performance uplift over Windows.
The benchmark is AVX-512 aware, regardless of which BLAS is chosen to run on x86 (Arm needs to be using the ArmPL BLAS in order to run deep SIMD instructions like SVE2 though).
And this is WSL2 Ubuntu 24.04 LTS performance. Phoronix showed few times that WSL can provide benefit, but this is not the case. Performance degraded from 33:48 to 35:52 and I can see strange utilization on graph: last 4 threads worked poorly. Also, memory consumption was up to 18 GB instead of 11 and for some reason GPU had utilization up 30%.
Oh this is interesting, I never thought to run it inside WSL2.
It was my understanding that WSL2 was a true virtual machine unlike WSL, which just implemented Linux call through the Windows kernel, so I would have assumed WSL2 would give decent performance… but I suppose there is overhead to the virtualization itself; or perhaps since WSL2 uses Hyper-V, Microsoft’s weird thread scheduling behavior reared it head again.
Yeah… me too. My previous run was with hypervisor but I recently updated my bios and I’ve made a few tweaks since that run. These are back to back runs with just the hypervisor toggled using bcdedit as described here:
In my experience the other way around - for compute intensive workloads WSL(1) all the way. WSL2 has fully virtualized memory model, while WSL1 only degrades vs native in syscalls (which n.b. can even be faster than native Linux depending on complexity and drivers)
Clean Ubuntu 24.04.01 LTS installation (with “sudo apt upgrade” and change power plan to performance). Microscopic +1.5% improvement over win 11 and its strange.
This is very strange, I don’t think I’ve ever seen Windows and Linux effectively get the same performance on the same hardware.
Maybe Linux needs a kernel upgrade to effectively schedule on Zen5 CPUs?
Slightly interesting results from this 9700X. MicroCenter combo, so 32gb RAM in a Gigabyte B650, 6000 CL32/2000mhz IF with a cute little Wraith Stealth (I think; copper slugged variant from what started life as a 1500X system). It’s almost certainly the AVX-512 causing it, but on my 5950X I saw CPU temps a solid 15C cooler throughout the benchmark than a “generic” all-thread workload provided by GtkStressTesting; here that was inversed quite a bit, with the CFD bench holding 88-94C vs a consistent maximum of 87-88C for all-thread. Currently 142/75/150 PBO limits (so basically 105W TDP) and default fan curve, believe there’s considerably more noise to be had which may drop temps further.
Edit to add: 39m19s on default + EXPO + iGPU disable with 6700 dGPU. With the above-mentioned PBO settings, it regressed to 39m24s just installing a dGPU, though it was also on decidedly slow storage at that time.
Phoronix is using the same version of Linux and getting decent results on the benchmarks they run.
This is just a guess based on all the possibly inaccurate information floating around the internet about Zen 5, but perhaps the whole cross-CCD latency issue is showing up here.
A way to test if this is causing the lower than expected score would be to separate the two CCDs memory from each other when they are calculating the results to keep cross-CCD communication to a minimum (at least in theory).
This command should accomplish that: ./comsol62_bechmark_CFD_only_10GB_Linux_x86-64.sh -numasets 2
15 sec improvement. And I’m not sure if NUMA2 actually works because there are no NPS related settings on my motherboard BIOS. The solver inside COMSOL may not recognize zen5 and using schemes for general x86 processor.
just wild guessing here, but shouldn’t the CPU recogniztion only be relevant to the BLAS used?
Maybe you could pass your system’s one with -blas path -blaspath /path/to/loca/blas
We need to find one more person with 9950x to confirm this strange behavior. My intuition tells me that this test can be completed in 26 minutes at the performance level of a 24-core zen4 threadripper, but something happens on software side.