Considering the 6 core W790 Xeon - am I crazy?

That’s a good question, I didn’t see any mention of redrivers in the W790 sage manual, but they are obviously there.

Looking at the SAGE’s PCB it looks like there are 12 redriver chips which coincide with the 2 lane chunks of BIOS redrivers available to tune:

looking at the ACE’s PCB it looks like it only has 4 redriver chips on it which would mean only 8 lanes on it have the ability to redrive the PCIe signal:

1 Like

Thanks for the quick reply. Hmmmm… I might’ve hoped that the U.2 connectors had redrivers, but given the positioning, very likely no such luck.

fwiw on the SAGE the slimsas connectors will run a U.2 drive at PCIe 4.0 speeds without issue, at least with the microsata PCIe 4.0 rated slimsas>U.2 cable I have. Since the slimsas ports are off of the chipset they are limited to PCIe 4.0

2 Likes

You mentioned that in another thread that I started actually, now that I think about it: Connecting U.2 to W790 boards

That’s reassuring to know.

2 Likes

Can you provide the sources for your claims about emerald rapid having better idle performance?
I’m considering getting an w3 2423 plus board too because the setup i have right now (an 6800k + ASUS X99 DELUXE II) just pulls too much power on idle (board and cpu alone 57.9-62.9w (not optimized) when system reports <5% usage), obviously I’d like to prevent that for my next setup.
However I need a good amount of connectivity and while a modern ryzen could satisfy my needs for that right now I’m also considering adding a few things to the system in the future, so I’d exceed the 24/28 lanes one can get with a ryzen/even what i have right now. Besides the system mostly idles and I heard AMDs chiplet design really doesn’t help with idle power draw, no matter how efficient they are under load.
Since I don’t need a lot of performance 6 cores are totally fine and would I ever need an upgrade the platform offers up to 56 cores (probably quite cheap considering how the prices of other xeons developed).
So, the platform seems fitting for me (my avg. system load is 7%), as long as idle is good. If u got any other suggestions for a platform, I’m open to that, I just wasn’t able to find anything fitting my needs (as amds chiplets, missing features/connectivity on other older Intel platforms and missing information on idle power consumption are common and mostly conflict with my needs)

Why the W790 and not C741? I just saw, W790 is more expensive on the motherboard side (entrypoint), while C741 is more expensive on the CPU side (entrypoint), however in sum u get to the same price, but with more cores on the C741 (server, not workstation) side. And emerald rapid is already available for that.
Only reason I could see is that the workstation chip clocks/boosts higher, but please tell me if there are other things im forgetting here.
The serverchip actually looks more compelling to me, considering i get more cores and the more efficient and lower idle power drawing emerald rapid gen

It was mentioned in the ServeTheHome’s Emerald Rapids review that it has around ~100W less idle power consumption per socket than Sapphire Rapids.

STH mentioned at the end of that page that even though it’s not 100W as Intel claims, they do see around 160-180W lower power consumption with their unit (which translates to 80~90W saving per socket).

For a reference, my 3495X constantly consumes ~160W with light workload running (VMs).

Just to jump in with a clarification, I believe the w-2500 series will be a sapphire rapids refresh, not emerald rapids.

Thanks, it seems worth the wait, so that’s what I’ll do. Sadly I was wrong with my statement from before, C741 with the cheapest server emerald rapid does not come at the same price as W790 with the cheapest workstation saphire rapid. 4 of the cpus that were released as 5th gen scalable xeon are saphire rapid, not emerald rapid and those 4 are the cheapest models :confused:
It’s hard to tell if the idle optimizations etc also account for those, I hate it when last gen architecture gets mixed into some models of the latest gen…
C741 still seems like a valid option tough, mb for 630€ + cpu for 460€ (Xeon Bronze 3508U), with 8 cores but no hyperthreading and for 530€ (Xeon Silver 4410Y) u even get 12 cores with 24 threads and higher boost. The only reason not to pay the extra here would be if the 3508U, released with emerald rapids, actually also has the optimizations, maybe even raptor cove cores, with a workload that is more focused on single core performance and a server that idles a lot of the time (my case).
W790 instead comes with a cpu for 270€, but the motherboards start at 820€, u also only get 6 golden cove (saphire rapid) cores, those are with hyperthreading tough and can boost higher than the 3508U.
So it’s 1090-1160€ for C741 against 1090€ for W790. Only other benefit at this point is better general I/O for WS790 boards, since they start with more pcie slots etc, however C741 CPUs start with more pcie lanes and if u get the mb to use that u probably also have a second sockel, also supported by the more expensive c741 cpu i picked out here.
But at that point the budget of the person who opened this discussion is exceeded as well as mine.
@hsnyder if u read this, hope I was able to give u some good options and not just waste ur time, maybe C741 is something for u, can’t tell for myself yet :confused:

I think I saw a MLID leak saying they will release another gen of the w-3400 and w-2400 series (i dont know how they’ll count the numbers up) and even tough it will be called Saphire Rapid Refresh, it will use Raptor Cove. My memory might be wrong here tough and if it’s not, it’s hard to tell if those idle optimizations got into this refresh aswell.

@yralf fwiw, puget systems (who I think did a poor job in the initial SPR-WS reviews), measured the idle power consumption of a top of stack w7-2495x at 39 watts in windows under the default power plan, however that rose to 72 watts when the “High performance” plan was applied.

I’m making some gross generalizations here but the only sapphire rapids CPUs that had bad idle power consumption where the chiplet based XCC ones, the monolithic MCC based CPUs did pretty good in idle power consumption considering the amount of I/O they have.

In the emerald rapids generation of CPUs, only the XCC die CPUs see a large decrease in power consumption because Intel de-chiplet-ifed them, making them more monolithic. This means that the MCC die emerald rapids CPUs are roughly on par with power consumption to there sapphire rapids MCC die counterparts.

There are some weird exceptions like the 3508, 4509 and 4510 which are truly sapphire rapids cpus

1 Like

No worries, this isn’t a waste of time at all. I haven’t bought anything yet so I’m curious what other people know in terms of other options. Styp’s advice was pretty spot-on I think. I’ll buy the GPU as soon as I’m forced to (which hasn’t been yet) and then I’ll see how much money I feel like spending on the platform. I’m hoping to make it through to the W2500 release, even if it’s just to know what I’m not buying

1 Like

Ahhh very interesting!

Why do you need 16 lanes on one GPU? The A6000 and RTX 6000 ADA are both PCIe 4.0, but the listed motherboard and CPU are PCIe 5.0, so you will get max speed GPU over 8 lanes from the motherboard and CPU. Maybe I’ve missed something?

Some high end desktop motherboards boards like the ASUS pro art z790 Wi-Fi offer 10 Gb Ethernet. I don’t know if that 10Gb performance is compromised by any build factors e.g all PCIe lanes in use ???

Desktop can be more cost effective, I went that way for my dual GPU build as wanted to put the money in the GPU memory, and the 20 PCIe lanes from the Core CPU was ok for me (2 GPUs at 8 lanes each, one NVMe at 4 lanes.)

I’ve no idea on the specific instructions you listed for desktop Core CPUs.

The suggestion above to up the spec of the motherboard, and start with the basic Xeon def gives a good upgrade path if needed. A Xeon motherboard with 5 or 7 slots would also open the door to more than two gpus e.g. 3 used 3090Ti which would give you 72GB gpu memory and be cheaper than the 6000 ADA, whilst offering more performance for many AI workloads. Not sure on int4, the 6000 ADA might be faster there.

Why do you need 16 lanes on one GPU? The A6000 and RTX 6000 ADA are both PCIe 4.0, but the listed motherboard and CPU are PCIe 5.0, so you will get max speed GPU over 8 lanes from the motherboard and CPU. Maybe I’ve missed something?

Unless I’m very much mistaken, if you plug a device that’s only capable of PCIe 4.0 into a 5.0x8 slot, you get a connection at 4.0x8.

3 used 3090Ti which would give you 72GB gpu memory and be cheaper than the 6000 ADA

GPU memory per card is my issue, unfortunately. Model parallelism is of course possible but it’s a pain, particularly when you’re just experimenting.

4 Likes

Oh, thanks, good to know, how do I know which are XCC and which MCC tough?

For SPR theres a chart that shows it:

Also you can tell by the IHS, it’s noticeably different between the two.

​​​ ​ ​
​​​ ​ ​

I haven’t seen a chart for EMR yet, but its pretty easy to tell by L3 cache, the XCC die skus have ~5MB of L3 per core making them about three times as large as the MCC die EMR processors.

You are not mistaken, that is exactly how it works. A motherboard cannot magically turn a PCIe 5.0 lane into two PCIe 4.0 lanes on demand. A lane is a lane, regardless of what speed it runs at.

4 Likes

Every day is a school day.

I wonder how much performance I’m leaving on the table due to this. I’d assumed that the numbers in nvidia-smi and the Nvidia control panel meant a different thing, and that the dual GPU Core CPU builds from big SIs like Scan, Lambda Labs and Bizon Tech would be a good workstation build template and not be making any serious performance compromises.

On holiday this week, so might do some benchmarking of single GPU vs dual GPU. Mostly I’m running docker containers from Nvidia so will need to find some suitable ML or DL benchmarks. If you have any suggestions I’d be interested. Might also be useful to post results here to help with your build choice?

While PCIe generations double bandwidth per lanes, you can’t simply convert Gen5 8x to Gen4 16x

1 Like