Does a proper dual SP3 workstation motherboard exist?

If not, is the ROME2D16-2T as close as it gets, or is there something else out there that I should be aware of for a ML workstation build?

If you’re doing ML I assume you’re using multiple GPUs and not relying on the cpus cores, in that case TR pro would be more your speed

Epyc is for servers, TR pro is for work stations

1 Like

Why he says this is that thread ripper pro has both high clock speed, and lots of cores. epyc just has lots of cores. Often GPUs are limited by a single threaded task on the CPU, and adding more cores does not remove the bottleneck.

Last I heard the nvidia rtx3090 ti ran about 200 times faster than the fastest 13th gen intel cpu for ML tasks. Depending on the ML task, the nvidia RTX 4090 runs 2x to 6x the speed of the 3090ti.

From my tests, the apple M2 ai processor is about 40% the speed of the RTX 4090 in stable diffusion. You can get a whole computer for about 1/3 the price of a rtx 4090, but fewer models are optimized for apple ML.

Also you may try renting time on a few cloud servers and trying out your work load. That way you can discover which hardware your task will run on.

3 Likes

How does ML learning actually work? Software driving? Is the software based on GPU or CPU performance or a mix of both?

As others have already said, wouldn’t a TR Pro better suit your needs? The mobo you mentioned will have a similar amount of slots for GPUs as a TR Pro build, I can only think of the extra amount of RAM being your major deciding factor.

Mostly GPU.

1 Like

The genoa (successor to rome) motherboards are shipping (I got one) but the low core count (and higher clock speed, and cheaper) CPUs are not yet shipping. They were announced in November, but just started shipping last week. Scalpers are doing their thing, raising prices and hurting supply.

The advantages to rome are:
40% more computation performed per cycle, so a 4ghz genoa is as fast as a 6ghz rome.
genoa has ddr5 in 12 channels all to the central chip, rome has ddr4 to each chipset, and can then exchange data amongst the chipsets. With rome, if you used ram on channels going to cpus that you weren’t using, those transactions could be several times slower. Also ddr5 is 30% to 3x faster, and there are 50% more channels, which is up to 50% better as long as you fill all of the slots. Total throughput from 80GB/sec to over 400GB/sec. But, currently low supply. from china they are 2x the price of ddr4 RDIMMs, from the usa, they are close to 10x the price, though that may be resolved in the next few weeks.

Genoa has PCIe5. rome had PCIe4. 5 is twice as fast, but currently little hardware exists that uses PCIe5. PCIe5 is backwards compatible to 4, 3, 2, and 1.

Actually in my case it’s mostly CPU, as I focus on reinforcement learning. GPUs are often used for reinforcement learning, especially when training with really large batch sizes, but in my particular case, CPU is king.

Thanks for the suggestion, but I’ve already done this. My problem (reinforcement learning in bullet physics environments) benefits tremendously from high CPU core counts and lots of CPU cache, which is why I’m targeting a dual 7773X build.

That’s not to say I’ll never do any supervised ML, but that won’t be the main focus of this rig.

Are these workstation class boards, or server boards?

This is awesome, but not very important for my application. In reinforcement learning you tend to have a ton of worker processes operating in parallel, each with their own copy of the environment, each doing a ton of model inferences in order to collect feedback about how that version of the model performs in the environment. This feedback is shuffled back to one or more central learner processes, but the size and rate of said feedback is negligible compared to the fetches required for the inference and stepping of the environment.

That’s not to say that Genoa wouldn’t likely have other benefits, such as the memory bandwidth, as you mentioned. AVX-512 is probably the most interesting/impactful for my particular application. However I’m just somewhat skeptical that I’d be able to achieve as good of performance on a $15k Genoa build as I can achieve on a $15K Milan-X build for the specific problems on which I’m focussed.

And, lucky for you, Genoa started shipping 2 weeks ago, so milan CPUs (on ebay) are now about 15% the price they were 2 weeks ago. I just checked and there are some 64 core units selling for $1000 buy it now price, 2 weeks ago they were over $6000. ie:

You can buy a used motherboard off of ebay, or a new motherboard from newegg etc.
a few considerations when you get your motherboard:
If you intend to use a full length PCIE card without a riser card, make sure that there is clearance for the card.
https://www.supermicro.com/en/products/motherboard/h12dsi-nt6
BTW from supermicro you need an “H12” board for milan support.
and
https://www.asrockrack.com/general/productdetail.asp?Model=ROME2D16-2L%2B#Specifications
has the CPU sockets behind one another, to allow the use of full length PCIe cards
https://www.asrockrack.com/general/productdetail.asp?Model=ROME2D16NM3-2T#Specifications
has the CPUs both side by side on the rear of the board requiring the use of riser cards to use full length pcie devices.

Oh yeah, way ahead of you there. Although I’m after Milan-X, and those didn’t really see as much of a drop. Ordered two 7773X chips and that Gigabyte board (v3.0, don’t worry) that I can never remember the part number for (but if you know Rome/Milan motherboards, you likely know the one).

Edit: MZ72-HB0 v3

Edit2: I should also mention that I gave up on making it a workstation build and it’s going to live in a rack in my basement/garage along with my old crusty pair of IBM 3950 x5 servers.

1 Like

Trying to build something very similar. Eyeing the
https://www.asrockrack.com/general/productdetail.asp?Model=ROME2D16-2T
but I can’t find it anywhere currently in stock.
Looking at custom loop water-cooling for the CPUs and adding more cooling to the VRMs.

I wish I could test the CPUs somewhere since I am ordering them used. Hope I can get the board in time to set up a build.

Yeah, I went down a very similar path. That ASRock Rack motherboard is unobtanium. Gigabyte or Supermicro are likely your best alternatives, IMO. There are also a few Tyan boards out there, but the price, in-line chip alignment, and dual 10GbE NICs on the Gigabyte board fit best for my build in the end. Would’ve rather they’d used Intel controllers for those, but I’ll likely slap a 40GbE Mellanox card in there at some stage.

I entertained the idea of a custom loop for all of 34.6 seconds, but after deciding that it’s going in a rack, I went with a Dynatron A39. I’ll be buying a case that can fit a 360mm radiator though, just in case I see throttling on the A39.

FWIW, if you want to benchmark Milan vs Milan-X, Azure has HBv2 instances that are Milan-X, and GCP has T2D instances that are Milan CPUs. Unfortunately both use parts that are only available to the major cloud vendors, but it’s better than nothing.

1 Like

Newegg claimed they sold an ASRock Rack board 3/09/2023…
I plan to use it as a real “workstation” to do code on, some AI with GPUs, and so on. Compilation tasks these days do quite well in parallel. Waiting less = happier me.

As for testing - for me it’s less about the performance but about stability.

@some_rando if possible, do you have a way to test how a Windows 11 boot would work?
Perhaps more importantly: Do you think you could test for DPC?
The reported DPC latency

is concerning to me, as I may want to do some audio at one point. Not a core use-case but still.
Wonder if they may have improved something.
ASRock Rack and Supermicro seem to do a lot better in their boards.

If you are going to be booting windows 11, make sure the motherboard will take a tpm module, and it is available.

The only source of a tyan tpm module was in the uk.

Sure. I’d planned on installing Windows briefly just to do a few benchmarks to validate thermals. I won’t be installing a TPM, but I can get around that w/ the registry hacks in the installer.

Are there particular tests you want to see ran, or do the tests mentioned in this Reddit post work for you?

It will likely be around a month until I’m able to do this, however. I’ll create a new thread for it here when I do. Do you want me to ping you in it?

We can sync up when you get closer. I bit the bullet and ordered the Gigabyte board today myself. Now I have to piece together all the other components… RAM - torn if I should go with less modules first and hope prices go down. (didn’t read what the minimal configuration is).

First off, in that case I hope you aren’t planning on Milan-X. The 3D V-Cache is nice, but it’s really only beneficial for a few very specific workloads. If one of those workloads isn’t a primary use case for you, by going with Milan-X you’d be paying a premium for worse performance.

Regarding stability, I think most any dual SP3 setup is going to be pretty darn stable as long as:

  1. you have good cooling and airflow,
  2. the driver support isn’t complete cat vomit (this is my first epyc build, but AFAICT it’s not)
  3. you have a power supply that’s chonky enough to handle the peaks, and
  4. you are running decent RAM (ideally from the QVL list from your mobo).

This is in no small part due to Epyc parts being SoCs, meaning so much more of your system is actually under the lid of the CPU. This should in theory make systems built with these parts behave much more consistently than proper workstation or desktop systems built with motherboards that have chipsets in the mix.

1 Like

Thanks for the suggestions, well, I already bought some 7763 that I am expecting to arrive here soon. Got a really good (used) price and hope they are not wonky.
They ship from within the US.
As for RAM I was eyeing nemix RAM as the price point is so attractive.
There are some offers for used RAM but it will be more difficult probably to get enough of the same type.

As for cooling I plan to use a custom water loop with sTR4/SP3 waterblocks and have to figure something out for the VRMs. Maybe add gynormous heat sinks?

Actually I am wondering if a custom loop is the best idea since I have separate fan headers for both CPUs. Maybe one AIO per CPU? Or just ignore one of the fan headers?

The computer will be in the same room that I am in, so cannot have a lot of fan noise.