CFD-Multiphysics Benchmark for x86 and ARM (Windows/macOS/Linux)

7950X3D DDR6200 CO -15
-numasets 2
Win 11



1 Like

Ran the ~10GB CFD bench on a few of my machines, for giggles and/or reference points. In chronological order by CPU release date:

Time Taken Hardware Operating System Notes
1h 52m 52s Xeon 2630L v4 (10c/20t 65W TDP), 256GB (4x64) DDR4 ECC LRDIMM @ 2133 Win10 Pro OS resided on a Samsung 950 Pro NVME drive. CPU steady states around ~45W per HWInfo64.
1h 28m 25s Xeon 2630L v4 (10c/20t 65W TDP), 256GB DDR4 ECC LRDIMM @ 2133 Pop!_OS 22.04 Same OS install and same BX500 SATA SSD as the T470’s result set below
3h 48m 45s Thinkpad T470 - i5 7200U (2c/4t), 64GB RAM @ 2133 Pop!_OS 22.04 Definitely good for email and browsing :yay:
2h 32m 26s Thinkpad T480 - i5 8350U (4c/8t), 64GB RAM @ 2400 Pop!_OS 22.04 Same 3200-capable RAM kit as was in the T470
1h 43m 8s Lenovo Legion 5 laptop – Ryzen 5600H (6c/12t), 64GB DDR4 @ 3200 Win11 Home Quiet power plan (~25W CPU package power at steady state per HWInfo64).
1h 32m 6s Lenovo Legion 5 laptop – Ryzen 5600H (6c/12t), 64GB DDR4 @ 3200 Win11 Home High Performance power plan (~45-50W CPU package power at steady state per HWInfo64)
0h 41m 59s Ryzen 7800X3D (8c/16t), 64GB DDR5 @ 6000 Pop!_OS 22.04 Several background tasks present during the run. Machine required -3drend sw to run

I included the T470 and T480 results as the 7200U and similar is the current cheap cast-off low powered info worker business stuff, especially in the mandatory Win11 hardware upgrades flood we’re likely to be in the next couple years. Mostly for homelab hobbyist notes on what kind of compute you get for the money. As a comparison point on the 7200U, at least with Passmark it’s roughly equivalent compute to a Raspberry Pi 5 and the N5105’s of the world (source). Still, interesting seeing that the scaling wasn’t linear with core count between Kaby Lake and the refresh of it, even with the bump in memory frequency. Didn’t have the tooling to confirm if the 8350u had dropped back to base clock (1.7 vs 2.5 for the 7200u), but probably.

Past that, also interesting seeing how the typical recent-ish gamer laptop and/or mini PC CPU does against one of the lowest power v4 Xeons. Especially when the power budget was scaled down on the Ryzen. x99 CPUs are cheap, but it’s not a platform I’d buy into now if I hadn’t already had one collecting dust in the corner. That perf uplift from swapping OS’s, though… not entirely surprised, other than just how much it moved.

On the -3drend sw front, the only box that required it was an AMD GPU under Linux (7900XT). The x99 has an Intel ARC A380, the Thinkpads are intel iGPUs, and the gaming laptop is not in hybrid mode and has a RTX 3060. For what that’s worth, anyway. :slightly_smiling_face:

1 Like

reran with faster memory settings

1 Like

8000 ram less cpu \

Interesting that shows that running 1:1 6600 gives best performance, even if you can hit 8000 on 16G SR RAM, my DR 32G DIMM max out at about 7000, so that is not an option anyway but good to know I am not missing out.

Can you also run using your 6600 1:1 config with the the -numasets 4 command, where the 4 = number of CCD in the chip, you just make a shortcut then add it as shown below, then run from the shortcut.

In my system this looks to improve performance, but would like to see that confirmed.

1 Like

Are you sacrificing timings and memory latency to get 8000 MT/s? I believe this test benefits from latency the most due to the frequent memory accesses to small blocks of numeric data for the stencil values. Am I wrong @twin_savage?

1 Like

that profile i ran had less cpu clock speed so im thinking cpu boost matters more will test more soon

1 Like

The CFD-only benchmark isn’t severely memory bound like the other two benchmarks, the CFD-only still loves more cores and especially frequency, of course fast memory helps it to.

I’m wondering if the difference in performance on 7960X is coming from the single rank memory @MSIMAX_OVERCLOCKER is running vs the dual rank memory @RMM is running?

Didn’t some of the older Zen architectures get a big benefit from running dual rank over single rank in the past?

Its interesting that splitting the CPU up into 4 logical sockets (at the user space level) helps it this much. makes me wonder if setting NPS=4 in the BIOS would have the exact same affect, theoretically it should since its supposed to be doing the same thing, just at the BIOS level instead of the operating system level.

For DDR5 DR it not much due to the dual channel nature of each DIMM, DDR4 was a big impact, but it could be a percent or two, I think the main difference is due to the manual OC vs PBO, I am averaging >5.2G all core, while due to the all core limiter best case without manual OC you would be running 4.8G - 5G depending on your boost adder in PBO, +200mhz boost, can hit 5G best case scenario with PBO.

1 Like

Not sure why but on my AERO TRX50 there is no NPS option in the bios.

From my testing on my own CFD + HT thermal problems as well numasets 4 is consistently a little faster as well as MKL, so that will be my daily setup.

It is interesting that for EM stuff intel seem faster, like the inductor demo model and your own EM+CFD model, but for CFD and thermal AMD seems to have the advantage in CPU.

1 Like

testing now with my pbo settings which land me at 5.5ghz all core loaded boost and memory is at 7000 1:1

done

3 Likes

Ran the ~260GB EM-only bench on the x99 box, mostly cause I was curious if I even could with 256GB RAM and the core count I have:

time taken hardware operating system notes
1h 1min 6s Xeon 2630L (10c/20t 65W TDP), 4x64GB DDR4-2133 ECC LRDIMMs Win10 Pro CPU steady state is 2.0 Ghz at the 45W power draw noted in the earlier results. This run only needed ~211 GB physical and ~223 GB virtual memory per the output window.
1h 3m 5s Xeon 2630L (10c/20t 65W TDP), 4x64GB DDR4-2133 ECC LRDIMMs Pop!_OS 22.04 This run needed ~233 GB physical and ~245 GB virtual memory per the output window.

And ran the 60GB CFD-EM one on the hardware that still could (i.e. no sense swapping the ram back into the 7200U :yay:). Both solvers just in case something wacky popped out:

time taken hardware operating system notes
21h 49m 47s Thinkpad T480 - i5-8350U (4c/8t), 64GB DDR4-2400 Pop!_OS 22.04 PARDISO run. Had Vitals set up this time, CPU floats between 2.3-3.2 Ghz during the run - rather than leveling out at the 1.7 base clock as previously thought.
20h 48m 26s Thinkpad T480 - i5-8350U (4c/8t), 64GB DDR4-2400 Pop!_OS 22.04 FGMRES run
16h 31m 50s Lenovo Legion 5 - Ryzen 5600H (6c/12t), 64GB DDR4-3200 Win11 Home PARDISO run. High Perf. power plan.
16h 18m 53s Lenovo Legion 5 - Ryzen 5600H (6c/12t), 64GB DDR4-3200 Win11 Home FGMRES run. High Perf. power plan.
11h 27m 40s Xeon 2630L (10c/20t 65W TDP), 4x64GB DDR4-2133 ECC LRDIMMs Pop!_OS 22.04 PARDISO run
12h 1m 1s Xeon 2630L (10c/20t 65W TDP), 4x64GB DDR4-2133 ECC LRDIMMs Pop!_OS 22.04 FGMRES run
13h 21m 2s Xeon 2630L (10c/20t 65W TDP), 4x64GB DDR4-2133 ECC LRDIMMs Win10 Pro PARDISO run
13h 27m 49s Xeon 2630L (10c/20t 65W TDP), 4x64GB DDR4-2133 ECC LRDIMMs Win10 Pro FGMRES run
7h 2m 8s Ryzen 7800X3D (8c/16t), 64GB DDR5-6000 Pop!_OS 22.04 PARDISO run. Did not require -3drend sw for this one.
7h 0m 18s Ryzen 7800X3D (8c/16t), 64GB DDR5-6000 Pop!_OS 22.04 FGMRES run. Did not require -3drend sw for this one.

I’m not sure what to make of the CFD-EM results, honestly. The Xeon and the 5600H seem to have spread out more on this one than the CFD-only, and in the other direction. Additionally, which solver is faster seems to flip flop, too. But likewise, for this particular workload the 7800X3D is getting smoked by the M1 Max that was posted upthread. :person_shrugging:

3 Likes

I think this flip in performance is coming from the Xeon’s superior threaded memory bandwidth; the CFD-EM is very memory starved, and the EM-only is basically a memory bandwidth benchmark.

I’ve been fairly impressed with the Apple silicon. I’m this :pinching_hand: close to going over to the Mac Studio subreddit and pestering someone with an M2 Ultra to run the CFD-EM benchmark.
If that little desktop can beat the best single socket x86 has to offer I think I might buy one of those my next upgrade.

This is probably at least partially due to the difference in memory concurrency between disparate cores on Intel and AMD architectures.

2 Likes

This inspired me to install dual boot with pop-os Linux, and it is indeed faster, I guess COMSOL is just faster in Linux?

CFD-EM: 3h 31m
CFD Only: 17m 54s (first into the 17’s !, like drag racing LOL)

With -3drend sw seems not usable to me to run daily Linux however, I have a 6800XT GPU, it looks like AMD GPU are not supported?

1 Like

It seems like the hardware accelerated graphics problem is distro dependent rather than hardware depended. AMD GPUs have always worked for me in the past.

Wendell mentioned that when he installed Oracle Java it would run without the -3drend sw argument, after having previously encountered the graphics error.
So perhaps you can get hardware accelerated graphics working if you use the Oracle Java?

Not just COMSOL but most of the intensive calculation software.
Probably better sheduling and/or access to hardware.

Interesting that for small problem the 5600H is on pair with the Xeon dispite fewer core.
But for larger problem, as pointed out by twin_savage, the Xeon take the lead due to better bandwith probably (maybe cache too).
Have you run the 7800X3D on the other version of the bench ?

The AOCL BLAS? No, not yet. I’ll give it a try and report back.