Getting the most from your new Epyc Server

wendell · December 12, 2019, 2:42pm

*Thanks to GIGABYTE for loaning me the G221-Z30 and the R282-Z90 to help make these guideslearn more about their Epyc server offerings here *

With a maxed-out Dual Epyc 7742 config, 128 cores and 256 threads, there is little software out there that can take full advantage of that scale right out of the box.

Things are a little different.

While scientific and rendering software can usually scale to that pretty well, not much business software can really take advantage of that without using Virtual Machines or another scaling strategy.

A full set of eye-watering benchmarks is here:
https://openbenchmarking.org/result/1910057-SP-1909192AS94

As we learned from the first video in the series, cores this fast (and dense) can easily replace entire racks of legacy machines. While I used VMWare to convert physical servers to virtual servers (p2v) and migrate existing VMs to new hardware, and while VMWare is a great product, other virtualization platforms like Proxmox can be highly effective.

Enter Proxmox

How do you get the most out of this much hardware? As we saw in the benchmarks we could run 5-6 instances of the Indigo Renderer, for example, to get the most out of our hardware. These Epyc CPUs are so good at context switching that, for my particular workloads, I was able to oversubscribe CPUs to Virtual Machines at a higher rate than the older hardware even while maintaining superior performance.

Getting started with Proxmox is easy.

TODO

bobby_tables · December 13, 2019, 9:19pm

Hey Wendell!

Near the end of the video I came here from, you were asking about How-to’s for server stuff. This video talks about consolidating to high CPU density. What about a video on building a GPU-dense server? I’ll soon be looking at either consolidating a few desktop rigs into a server, or setting everything up on the cloud. I posted a few more details in a thread over here.

Also, if you’re going to be at CES this year, I’d love it if you stopped by our booth to show you what it’s being used for!

peterbocan · December 13, 2019, 11:19pm

Hello Wendell,

you said in your video multiple times that it is possible to move all the old hardware onto the new one shiny server. My two questions are: should you? What about noisy neighbours and about a “single point of failure”?

How long it would take to migrate the whole 2-4 TB of memory onto the new server (spin-up VMs, services, etc.), if that’s even possible…

wendell · December 13, 2019, 11:47pm

One strategy for this is to have a 3 node cluster.

Any one node can be down. On critical failure you just wait for the VMs to boot up. Otherwise 100g crossover links only takes about 2 mins to do a full migration per terabyte of ram.

Igor42 · January 30, 2020, 2:52pm

Has anyone here set up an HPC cluster using these machines for scientific compute? I am looking into doing this and using Slurm as the workload manager. I have experience with Slurm as a user, but not deploying a cluster. I would appreciate any advice and/or pointers to good reference material.

Thank you in advance!

arrau08 · March 9, 2020, 4:01pm

To Igor42

I’m purchasing one with 2 Epycs on board, 512GB ram, 168TB SATA HDD. Will use it to do N-body+hydro cosmology/astrophysics simulations. To cut the price I have to do the managing part myself, including installation of load manager. Slurm instruction is even on youtube so that’d be one reference. But I’ll share my experience.

At first I got suggested 32 core intel server at ~$18000, but then looked around for epyc servers and am so happy I did that. The system is going to eat ~$24000 from my research funds, which is a big chunk but such a well-spent case, and I’m having a dedicated machine all for my research!

For storage you might also have to consider raid options, which is another headache to me but hope I can manage and share the experience as well.

wendell · December 8, 2020, 10:05am

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.