Cascade Lake, Ice Lake - Anyone have memory bandwidth numbers?

Was about to post an enormous wall of text but I think I’ll break it up into some concise questions that people might actually answer lol

Does anyone out there have any of these parts:

  • Cascade Lake-W 2000 (Xeon W-2265/75/95)
  • Cascade Lake-W 3000 (Xeon W-3245/65/75)
  • Cascade Lake-SP Xeon Gold 6256
  • Ice Lake-W3300 (Xeon W-3335/45)

- with all memory channels populated that they’ve run the AIDA64 Memory/Cache benchmark or similar on & could share? Specifically all I could use are memory read+write speeds.

Or, does anyone know any sites that have already done these? Because I can’t find anything.

Info is hard to come by, I have the single- and mult-thread performance benchmarks I need but the closest memory bandwidth numbers I can find are for the Cascade Lake i9-109xx parts.

For threaded memory reads:

Xeon W 2295: 82,946 MBytes/Sec. Unfortunately this was under windows, you’ll likely do better under linux. The theoretical maximum for this configuration is about 87.5 GiB/s so this isn’t too bad.

Xeon W 3275: Under Mac it’s getting up to 122,894 MBytes/Sec. This makes sense, because it has two additional memory channels compared to the Xeon W 22XX. Theoretical maximum bandwidth for this config is 131.13 GiB/s, so this is a very good score.

Xeon Gold 6256: 216,952 MBytes/Sec for dual socket, so it’s going to be identical to the 3275 for a single socket. These are both hex channel, DDR4 2933 on the XCC Cascade Lake die so they won’t score any differently. That is to say, the theoretical maximum bandwidth for this config is also 131.13 GiB/s.

Xeon W 3345: This baseline clearly doesn’t populate all memory channels, but you’re looking at a theoretical maximum of 190.73 GiB/s with 8-channel DDR4 3200, so you can expect something in that ballpark less 5-10%.

I should also note that, unless you’ve done something severely wrong (like running Windows), you’ll always see something within 5-10% of the theoretical speed. Useful if you can’t find benchmarks for a particular part.

2 Likes

Awesome, good info. Huh, I’ve gone this entire time without even realizing Passmark had memory scores included in baselines that you could view online. TIL. Wish they had info from the memory SPD/manufacturer info/etc on there with it.

Like I said, unless you’re doing something severely wrong you’ll see within 5-10% of the theoretical. Especially when you’re looking at systems with ECC, there is a very limited spread of CAS latencies and such between different manufacturers.

Keep in mind that memory performance is highly thread dependent. If I recall right, broadwell had like 50% higher single threaded memory performance compared to skylake (even though skylake was supposed to be superior; point I’m making is there are sometimes regressions in newer architectures) so it also highly varies by architecture. Also the compiler matters too, its not uncommon to see intel’s proprietary c compiler output hpc code that can perform memory transfers 25-50% faster than the same code compiled with gcc… even when run on an amd cpu.

While its kind of sparse at the moment, cutress’s chips and cheese website is pretty useful because it lets you looks at memory bandwidth benchmarks with different thread counts and by memory depth.
Most real world workloads don’t even approach the theoretical bandwidth of what a dimm is capable of. Skylake/Cascadelake/Icelake (pretty much all the mesh architectures) have all been kind of weak with single threaded memory performance compared to every other modern architecture.

So I should have gone with Broadwell-E for a 10 core with high single threaded memory/IPC performance? I held off on X99 because of Haswell-E memory controllers dying en masse.

Depends what you’re comparing it against; If I had to get the best single threaded memory perf/IPC chip now, I’d go for a alderlake or raptorlake as long as I didn’t need >128GB of ram (coincidently those two architectures aren’t mesh architectures).

I’ve got a e5-2697a v4 system (kinda like broadwell-e in that it is relatively few cores at high clocks) that goes through a mixed single thread/multithread simulations pretty fast

I’ve got a dual 2697A V4 system and it’s almost exactly half the speed of my 7713p main system in highly threaded SSE2 and AVX2 workloads.

I agree, if you want CPU compute on a budget Broadwell is still a great choice :+1:

I might just do with a 9900K with a high all core frequency.

The 9900K was about the end of the line for the ring bus intel used to use. There’s also the 9900X, which is the HEDT of the same generation; x299 was kind of an overlooked platform.

Epycs are finally getting cheap enough second hand that I can add them to my compute farm… I was hoping there would be some kind of breakthrough in HPC like HBM processors so I could justify buying a brand new server (electricity bill savings are worth something after all) but alas that has not happened yet.

Which is why I’m avoiding the 10900K because the extra silicon and cores pushed it beyond the balance between core latency and memory subsystem latency.

Comet lake’s 10 cores on a ring bus was pushing it alittle too far. Intel did use a dual ring bus on the higher end cpus of the same generation to pretty good effect though.

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.