My previous workstation (EPYC Rome) got repurposed as a full-time server, so I’ve reverted to using an old Dell with a 4C4T Haswell CPU for now. For day-to-day stuff it’s surprisingly OK! But, I do AI/ML and scientific computing development work, and while I have access to remote servers with high specs, it’s super nice to be able to test, develop, and iterate locally. So at some point soon I will probably build a new workstation.
For various reasons, I’m considering a W790 build with the lowest spec 6 core Xeon W3-2423, and I want to know if you guys think I’m crazy. Let me explain my reasoning:
Why Intel? I’m drawn to Intel because of V-Tune, AMX, and hardware support for float16. I do a lot of low level code tuning and it would be nice to be able to evaluate the latest instruction sets on my own to determine if they’d benefit my use cases, and a tool like V-Tune also helps with that.
Priority: single core performance. As this is a development workstation, I’m usually running one job at a time. While many of those jobs have parallel portions, the “backbone” of the software I write is typically more dependent on single-core performance and GPU performance than CPU multicore performance.
Price. The 6 core Xeon and the Gigabyte MW53-HP0 come in at under 1000 USD. By comparison, suppose I were to go with the 12 core overclockable chip. If I actually want to overclock it (which I would because again single core performance is pretty important for me), then I’d have to get a substantially more expensive motherboard and the baseline CPU and motherboard price is more like 1900 (not to mention overclockable RAM would drive the price up a bit too). The difference might matter because I may have to get a modern 48GB GPU and they’re so expensive that saving a grand on the base system would actually help here.
Why not a desktop chip? in addition to the instruction set nice-to-haves that I mentioned above, I’m worried about connectivity. I’ll have two GPUs in the system, and at least one of them needs all 16 lanes. In addition, I need at least 6 TB of NVMe storage (probably U.2), and if there’s room to grow upwards from that (or at least space for a 10 Gb NIC so I could expand via NAS), that would make me more comfortable. I am open to a desktop system in principle, though, if anyone has ideas.
Now please tell me if this combination, W3-2423 and Gigabyte MW53-HP0 is a bad idea . One thing I’m a bit worried about is how I’m going to connect a 6 TB U.2 drive to it, but I think there are enough m.2 slots and PCIe slots that with some sort of adapter I can make it work (granted, that adapter could add 100 dollars to the cost).
And to be clear, I would much prefer a 16 core overclockable chip, it’s just that I think for my use cases, the 48GB of VRAM is going to make more of a difference than the CPU clock and core count difference on the CPU side, so I’d rather put the dollars in that direction. 6 cores will (hopefully) get the job done (just slower) while with VRAM you either do or do not have enough of it.
Wow so this mobo has 4 x16 Pcie5 slots from the cpu and another 16 from the chipset. Thats some nice bandwidth for GPU’s and SSD’s. And 512GB thru rdimms or 2TB using rdimm-3ds so really money is the only limitation. That cpu is really a minimal workstation cpu, but at least you have a path to 56 cores. I think this with a 16 core would indeed be very useful. I think if you’re spending on high vram gpu’s then from what I’ve seen this is a better approach to running ai models. I’m still on my 10980xe but am contemplating something like this as well, as consumer motherboards are just so limited in terms of IO.
4.2GHz boost may be a bit too low. If it were me, I would go with a 2455X and OC to 4.8GHz or 5.2GHz. That would give approximately Intel 12th Gen single core performance. I’m not sure how much would this matters in AI/ML workload, though (especially that since AMX runs on CPU, I would assume clock speed gonna matter quite a bit here).
Also, I would prefer to wait a bit for the next W790 CPU. Emerald Rapids has a much better idle power consumption compared to Sapphire Rapids (although whether the 5th Gen Xeon Workstation will be Emerald Rapids or a Sapphire Rapids Refresh, is still unknown).
Can’t really find any holes in your logic - like you say, more CPU cores/clocks might be nice (I’d be really tempted by the W3-2435 for 2 extra cores and a much higher base clock), but not at the cost of things you definitely need.
As to the U.2 drive connectivity, you can just pickup a cheap M.2 to U.2 adapter (I’m using this with an Optane P4800X) or a PCIE adapter card for the 4x slot (e.g. this Startech card).
I actually think that you’d be fine with a decent AM5 platform especially with PCIe 5.0 as far as connectivity goes and given Gigabyte’s less than stellar track record on their workstation boards I’d look at another brand such as Asus, Tyan or MSI. I’d be also quite concerned about cooling a chip and motherboard adequate without a data center environment / rack as there have been multiple reports here about overheating parts (quality of life / noise) since it’s going to be a workstation.
I was / am kind of in the same boat as you are right now 18 months ago. The W790 platform was not available ‘yet’.
It seems like we are doing similar things, kind of. I had the choice to go for an AMD 5800x or 5950x and limit myself with expandability and AVX-512 unavailability. I chose not to, but I looked into a little older platforms and sacrificed single-core performance.
My questions:
How much RAM do you need?
Speaking of a ‘modern 48GB’ GPU - you consider one A6000, 6000ADA I assume?
Another thing to consider when picking motherboards (if you care about high quantity U.2 connectivity) is the built in PCIe redriver situation. I’ve got a W790 Sage and only 16 lanes on slot 4 and the first 8 lanes of slot 6 have a built in PCIe redrivers. The redrivers are configurable in 2 lane chunks to allow for different bifurcation configurations.
It sounds like everything you need is on W790. AMD definitely gave up the sub-$1K bracket this generation, because I suppose they (wrongly) assume that people who need massive I/O will always also need tons of cores. Considering their IPC and core clock advantages in the HEDT space (sometimes as much as 1.5GHz higher vs. SPR) you’d think they’d offer SOMETHING at the lower budget points.
V-Tune and AMX, sure. But float16 has been universally supported in hardware since CVT16 in 2009. RISC-V and ARM also support F16C. So, you need specifically AVX-512_FP16 extensions for your work, and the universal F16C instruction extensions don’t work?
F16C converts between half-precision and single/double-precision, it doesn’t do math on the half-precision packed registers, so code to process half-precision floats would need to convert to single-precision, do the math, then convert back. Since single-precision takes twice as many bits, you can only fit half the content in an XMM/YMM register, so it takes at least twice as long (conversion overhead too) as native half-precision math operations (AVX512).
I’m under the impression SPR IPC is a decent amount higher than Zen 4 threadripper’s; so much so that the 0.5-0.7GHz clock speed advantage threadripper 7000 has over SPR-WS in singled threaded applications doesn’t let it overtake SPR-WS’s performance. I’m getting this idea from the skatterbench results.
SPR is Golden Cove. Golden Cove has an IPC deficit compared to Zen 4.
Compared to Zen 3 it’s still not terribly impressive performance, but at least there you can see SPR has higher IPC. A lot of performance for SPR comes from dedicated accelerators, not unlike Apple’s SoCs, while the main CPU cores are still just the same architecture we’ve seen since 12th Gen. Intel’s FPU design on Golden Cove is also exceptionally strong, as such workloads bound to FP will see a boost.
I suppose it’s kind of nebulous to say “single threaded IPC” without a workload.
Here’s some various single threaded benchmarks between a w7-3465x and a Threadripper 7980x from skatterbench running at 4.8GHz 5.1GHz and 5.3GHz 5.75GHz respectively
|Benchmark||W7-3465x|Threadripper 7980x|
|—|—|—|—|
|super pi 4m||36.6|29.908|
|Geekbench 6 single threaded ||2277|2868|
|cpu-z single threaded||709.4|733.9|
|cinebench r23 single threaded||1714|1890|
Benchmark
W7-3465x
Threadripper 7980x
super pi 4m
32.643
29.527
Geekbench 6 single
2595
3027
cpu-z single threaded
821.9
781.3
SPR wins in IPC against Zen 4 in the latter two but not the first two.
My initial impression that SPR vastly outperformed Threadripper 7000 in single threaded was because I was looking at OC benchmarks of SPR vs OC benchmarks of TR7000.
SPR matches Zen 4 IPC in the first two benchmarks and then beats zen 4 by 20% in the last benchmark.
Yeah you’re right, I’m going to get rid of that chart and just replace it with the OC chart because now I’m not even sure the “stock” boost speeds are being hit on the processors, at least with OC chart you know exactly what frequency is being used.
Very interestingly, the programmed single threaded max boost frequency for the 7980x was 5.65GHz stock, not the 5.1GHz listed. Its like the opposite situation of Intel where the Xeons were clocking ~800MHz less in single threaded tasks as was claimed on the spec sheet<–this can probably be blamed on the awful power management Intel implemented for these processors, SPR really benefits from OC.
Oh interesting suggestion, thanks for pointing it out.
I’m pretty sure we’re getting an SPR refresh, not emerald rapids workstation chips, but nevertheless I can probably wait a few months and see what these end up being like.
Are you thinking of some method by which to break out PCIe 5 lanes into more PCIE 4 lanes? GPUs and most SSDs are still Gen4, so this would make sense if there were an easy way to do it.
Minimum 192GB, and 256 would make me feel better.
Yeah I mean an ADA card. I may end up not doing this as they’re just SO expensive that I’m kind of affronted, but occasionally I do get into a situation where I exceed 24GB…
Interesting, good point… Is there any way to know which slots do and don’t have redrivers before I buy the board? In light of the suggestion by quilt, I’m also considering the ASUS W790 ACE which has two connectors for U.2 drives. No idea if they have redrivers, though.
As xzpfzxds says, F16C just handles conversions, unfortunately.
Good, floating point is key for me.
Thanks very much to everyone who replied so far. Still thinking this through
Yeah I mean an ADA card. I may end up not doing this as they’re just SO expensive that I’m kind of affronted, but occasionally I do get into a situation where I exceed 24GB…
As I mentioned, I was in a similar position; back then but had an 3090 which was enough for me. I burned the card at one point - it’s ok learned a lot while doing it. What I learned by reverse engineering my total build costs is that the platform does matter, but the GPU is way more important. So, if you are budget-constrained and you know to go down the line with a 6000 ADA, I would start with the card first.
A platform can also be upgraded, and a 1000 USD CPU and motherboard combo can be resold at one point. I don’t think the 6-core is a bad option but I don’t see it being the best option.
Minimum 192GB and 256 would make me feel better.
That should be feasible even on a Ryzen or Intel 13th gen platform.
Your option:
MW53-HP0 (550 USD Newegg)
W3-2423 (400 USD Newegg)
192GB (4x48GB) - (around 600 USD?)
Option 1: (Best Single Core Option)
HIghend Ryzen 7800x System
4x48GB DDR5 ECC Memory (move later to platform)
Option 2: (Best Platform)
Something with DDR4 ECC (Cascade Lake, Ice Lake)
DDR4 is really cheap on ebay
24 core CPU with solid overall clocks.
You either get ‘best in class’ single-core performance for around the same money, or you can get a faster multi-core machine - now.
I had one 3090 and was convinced I needed all these PCI-E lanes and everything. The reality is it took me another 12 months to go dual GPU because I didn’t feel the need to do so. I bought a 24-core Xeon Gold 6200 and was happy with it. Yes, Windows is not as snappy as on a 13th gen, but 192 GB of memory is cheap, and my stuff runs very well in parallel. The machine is definitely a keeper in terms of stability and everything. Would I do this all over again? I don’t think so, I think for a single GPU start to start with a 7800x would be my choice at the moment; it has enough of everything, and once you know that a 2nd ADA needs to be installed, the platform can be used for a while (2x x8) and at one point upgraded to something bigger. The platform doesn’t give you all these benefits that a GPU can give you today.
Let’s face it, at least for me - a W790 is, to a certain extent, also an emotional buy.
I’m still going to try to avoid buying a 6000 Ada. I have been playing around a bit with model parallelism recently, to get a sense for how much work I’d create for myself just buying multiple consumer cards to deal with the VRAM deficit. Inconclusive so far, but while part of my reluctance to buy workstation cards is because of the cost, part of it is because on principle I find the prices insulting considering they’re the same silicon .
So maybe one way to decide between best single core, and best platform (as you put it) is to decide beforehand whether I’m going to start with one workstation card or multiple consumer cards.
@twin_savage interesting point, sorry for missing it the first time around. Is there any way to know ahead of time which slots have redrivers? I don’t see anything about this in the W890 Sage menu. My candidate motherboard has changed to the W790 Ace, so there may be some details in common…