I have a problem with an Enterprise build so I figured I’d ask the experts. We’re building a workstation for use solving Computational Fluid dynamics (CFD) engineering simulations.
Our relevant hardware config:
Supermicro H11SSL-i motherboard
512 GB registered DDR4 2666
AMD Epyc Rome 7702p processor
Seasonic 1000W Gold PS
The issue we have is that, for our application (Ansys CFX), we see excellent scaling up to 16 cores. After 16 cores, we actually see a drop off in performance (e.g. 56 cores is worse than 28 is worse than 16).
We’ve ruled out application specific issues (in fact we’re running the exact LeMans benchmark that you see results for halfway down the page edit: It says I can’t include links in the page so Google Image: ‘amd epyc rome ansys cfx benchmark’ without the quotes and click the first image.
We’ve tried 3 different fresh OS installs (Windows Server 2019, Windows Server 2016, and CentOS Linux 7.x) Same behavior.
We swapped the motherboard to an AsRock Rack EPYCD8-2T. Same behavior.
We tried a different 1kw power supply. Same behavior.
Tried tuning the BIOS to the settings shown in the link above. Same behavior.
Tried many other separate BIOS tweaks. Same behavior.
We’re in the process of RMAing the CPU now. I’ll keep you updated on that.
Question for the group: Are you all aware of any BIOS settings that might cause poor application scaling performance beyond 16 cores on a 4 NUMA node CPU such as this? Or any other thoughts you might have?
Maybe the CPU is defective…but it just feels like there’s something else that needs to be tweaked. It’s almost like the system is power-throttling at higher core counts…but it’s not lost on me that the problem crops up at 17 CPUs and (16+1) and that there’s 16 cores per NUMA node.
It just feels like we’re going to get the new CPU back on RMA and have the same issues. But I hope not.
Any thoughts/help would be GREATLY appreciated.