Follow Up: More Epyc 7551P Testing & Your Questions Answered #level1diagnostic | Level One Techs

wendell · January 24, 2019, 4:50am

Benchmarks: https://openbenchmarking.org/result/1901136-SP-1901068SP78

This is a companion discussion topic for the original entry at https://level1techs.com/video/follow-more-epyc-7551p-testing-your-questions-answered-level1diagnostic

lovitz · January 25, 2019, 11:26pm

I am currently building a geophysical HPC workstation around a Epyc 7551P cpu on a Gigabyte MZ31-AR0 motherboard with 8 x 64GB DDR-4 2666 LRDimms (512GB). I will have a Samsung 970 Pro 512GB system drive, 2 x Gigabyte 4xm.2 storage adapters populated with a total of 8 x 1TB m.2 drives (8TB striped) and 4 x 10TB enterprise HDDs in double parity config for in case data storage and backup. I am thinking of getting the Radeon VII at launch which will be way overkill for my workloads. Oh yeah, OS - Windows 10 Pro.

Wendel, I have watched your videos and read the forums and your 7551P benchmarks. I have also researched the AMD info on the Epyc and it’s inputs, but probably NOT as much as you have. So I have a couple of questions for you.

Do I need to worry about what components get plugged into which PCIe slots i terms of m.2 drive read/writes and graphics? Please correct me if I am wrong but it seems that SOC distributes the input equally to all dies and fully balances the load.
Can I assume the same with SATA drives? As they are direct connects to the SOC, I again assume it makes no difference which SlimlineSAS to SATA connectors the 4 drives are connected to?
Finally, do your benches would indicate that I would want to run Windows 10 + NUMA Dissociator for virtually any load?

wendell · January 25, 2019, 11:40pm

Possibly – do some benchmarks. You probably want each of the fast storage cards on different dies. That’ll probably happen naturally but you might do a usb bootable ubuntu just so you can lstopo and see the arrangement of numa nodes and peripherals.
those are so slow it doesnt matter where they are connected
you probably want to change memory interleave mode to “Die” which gives you UMA. UMA seems to be best for windows. You might have a mininor (< 5%) performance hit in some apps but it guarantees you don’t encounter those oddball situations where you have a 20-30% regression. Really though if you take care with your software its unlikely you’d have a problem in the real world. It’s extremely fast compared to other stuff out there. Just that the optimization is such low hanging fruit if you are affected by it.

lovitz · January 26, 2019, 3:43am

Check out these Epyc Architecture slide (slides 10 & 16 in particular):

Based on slide 10, showing G[0-3], P[0-3] : 16 lane high-speed SERDES Links, it would appear that each x16 slot should by design be hitting a different dies… or not. I guess I need to get a better block diagram from Gigabyte engineering to determine which links are utilized by each slot to avoid conflict - their block diagram is useless! Supermicro explicitly show the G[0-3], P[0-3] designations and each of the 3 x16 slots are on different dies. Ideally I will have all storage devices (m.2, m.2x4 m.2x4, SATA) hit separate dies.

I agree but if I can keep on a separate die, why not?
So no real benefit to NUMA Dissociator? My primary geophysical interpretation and processing app allows me to launch with memory reservation and will use as may cores as are available. I will leave standard and check task manager for excessive core activity, then change memory interleave and see if there is a difference in speed and resource use.

Still waiting for parts for my build. If you are interested, I will let you know what my results are. I am really curious to see if I wasted my money ($1560) on my 8TB (8 x 1TB) NVMe “drive”. The seismic data is stored in 4GB “bricks” so hoping my reads will be blazing fast with 500MB pulled from each drive.