AMD 32 Core vs. Intel 28 Core

Uque · June 24, 2018, 8:10pm

Was that a Freudian slip?

On the matter of using EPYC interposers to get 8 memory channels - why? Here’s my thoughs on why not:

EPYC interposers are more complex and therefore must be more expensive to produce. AMD already have a working, simpler interposer design that they used on 1st gen Threadripper.
AMD would be going back on their promise of (backward) compatibility.
It benefits AMD to have EPYC have multiple advantages compared to Threadripper, so they can sell those for higher price.
Having use for dies with failed memory controllers (and PCIe controllers) allows AMD to increase their effective yields.
Controllers for 4 extra memory channels will use power, that on workstation type workloads might be better spent on the processor cores.

cekim · June 24, 2018, 8:28pm

Phone auto correct being more cantankerous than usual… dry fingers?

TR1 was 2 hot and 2 dark dies. “working… design” is only true for 16 core configs. 32 is new no matter what (though it is conceivable they tested it previously - I/we don’t have that info).
“more complex”? is a relative term here yes the TR1/2 don’t use the off-chip infinity fabric channels, but the interposer is a PCB, so that sort of complexity is not really a barrier. The TR is a sub-set of Epyc functionality, so you have dark wires? No biggie. Upside to them is one and only one interposer to build/fab/test.
They can market/sell on binning/qual/support and other security features. They don’t have ECC like Intel, but this wouldn’t make or break them (but, along with binning for bad memory channels is the most compelling argument for them fusing off 2 of the memory channels).

Where I land is that it is entirely possible that either scenario is true - they may have fused off the other channels, or they may be relying on MB’s not wiring them up. Given binning and BIOS control, it may not matter in either case because as a practical matter, you won’t be able to get it to work.

MazeFrame · June 24, 2018, 8:36pm

TR is aimed at workstations! Who in their right mind builds a workstation (max stability and power efficency are key factors) and then overclocks it?

TR has 4 channel RAM access, two dies will have to ask their neighbours for RAM access or (even worse latency wise) have to shout into a general pool for RAM fetches. TR and Epyc will stay seperate for the sake of target audience (high-performance single socket workstation VS high-performance multi-socket servers).

Probably not, because else the dies would be good enough to become an Epyc CPU. This way, AMD can even use silicon with dead RAM controllers.

cekim · June 24, 2018, 8:48pm

TR is aimed at HEDT which is broader than just workstations, but even then… efficiency is a fairly distant nth factor in my personal world (within reason - I still need it to run from a wall socket and not cook my office).

Why the anger at OC? I very much understand efficient and easily cooled as a benchmark for a large segment. I divide my personal world into 2 categories:

“stable, easy to cool, noisy as it wants to be, but water is less than ideal”. Those machines go into the rack. I don’t want to fiddle with or maintain them. I want to batch jobs to them and run them for weeks on end without concern for random issues. Those are mostly xeons/ecc right now.
"fast as it can possibly go on an (un)reasonable budget with COTS hardware. I use this sort of thing for debugging and exploration of software. I’m not going to die if it freezes or gets a random error, but to be honest, OC is just not all that difficult to get 24/7 stable these days. I ran my 7980XE at 4.5GHz for the Bionc Pentathalon - 2 solid weeks at roughly 100% non-AVX load. I’ve run multi-week high load jobs for work projects as well. I can’t recall a single crash that didn’t occur while I was tweaking the OC intentionally.

Classic “latency vs throughput”… If I want through put - slow and wide is fine. If I want low latency, fast and as wide as I can go at that faster speed.

The way I see it is that if you want AMD to reach or exceed parity with what Intel is offering in terms of the whole space, then you need to consider OC as well. As per prior posts, intel is releasing product assuming LCD (Lowest Common Denominator) heatsinks, LCD VRMs, LCD PCB/MBs, LCD memory etc…

Their specs are wildly conservative relative to what 80-100% of their chips can do off the shelf with better cooling, more VRM capacity and tight tolerance PCBs/memory routing.

AMD is either going to need to reach this same point or they still have room for improvement.

MazeFrame · June 24, 2018, 9:21pm

Overclocking does not matter apart from publicity. It is a leftover from the time when chips came with 35MHz clocks and cases had turbo-buttons.

For benchmarks and competitions, there is overclocking. Real world outside of tech-communities, noone overclocks their daily driver. Simply because a few FPS do not really matter or because people are afraid of touching anything in their BIOS/UEFI.

HEDT became the techpresses favourite word after “gaming” and “pro” got put into product names.
Edit: Looked up HEDT, it is an Intel marketing term (from arround the time of megatasking)

cekim · June 24, 2018, 9:45pm

I’m sorry, but both of these assertions are demonstrably false.

First, I run my DD OC’d. I’m not alone. If you look into the world of “trading platforms”, you will see that there is a booming, very high margin market for even exotically cooled rackmount systems (but also work-stations) that first overclocked existing Xeon/HEDT parts and then began to get enough market clout to have Dell/Intel provide OEM SKUs with higher multipliers (OEM 2696v3 @ 3.8GHz vs RETAIL 2699v3 @ 3.6GHz both 18 core parts as an example)

Some examples - I don’t know anything about these companies and I don’t use my machines for high-frequency trading, but just to demonstrate to you what’s going on out there in the market:
http://www.broadberry.com/overclocked-rackmount-servers
http://www.icc-usa.com/overclockedservers.html

Second, OC head-room in the present context (of the last 10+ years) has been a proxy for yield. AMD is still playing catch-up here. The 10nm debacle might change, or may have already changed that, we’ll see…

Those models with more OC head-room are generally much more efficient at their marked speed (lower temp, lower voltage) than models that are running at the bleeding edge of their respective Si process.

This is a reflection of the exponential nature of power consumption, resistance, etc…

You are painting OC with the broad-brush of marketing bluster, but the reality is that it both informs the market as to who’s process is more robust as well as offers large niches the ability to achieve next generation performance from current gen parts if they are willing to pay for better cooling, VRMs and power consumption (the increased cost of which pales in comparison to the value of 20-50% more performance from any given generation of parts).

thro · June 25, 2018, 3:09am

It’s not as different from the way 1st gen threadripper works as some may think.

Even first generation TR has only 2 memory channels per die. The “quad channel” thing is a bit of a misnomer, any individual die can only directly access memory via 2 channels. The other two channels of the “quad channel” memory are via the infinity fabric and other die’s memory bus. TRv2 simply adds another two dies that need to share memory bus of the other dies entirely, rather than having their own memory channels.

Yes, memory bandwidth will be a thing in highly memory bandwidth sensitive scenarios, but fortunately Zen has some big caches.

If you’re memory bandwidth limited - sure, you’d be better off looking at Epyc or intel (maybe? see below). But there are plenty of processing intensive workloads that are not so bandwidth constrained.

Remember, even on an 18 core intel i9 CPU, there is still only quad channel memory access for the entire die - you still have up to 18 cores fighting for 4 channels of memory. Sure, they don’t need to go via infinity fabric, but they still have the mesh bus, and they’re still over-subscribing those memory channels.

The topology may be different, but on say 16 core TR vs. 18 core intel, each core overall only has roughly 1/4 of a memory channel worth of bandwidth to main memory.

Intel also does higher core count processors than 18 on quad channel boards in the Xeon range…

cekim · June 25, 2018, 2:40pm

Waiting to see numbers on how much of a concern this really is, but the issue is less quad vs octal channel memory as it is path from a memory-less node to memory making latency very uneven relative to Intel’s mesh or even ring approach (which had hit a wall scaling to core count).

The Epyc NUMA configuration already requires OS/Application re-work to optimize for the cost of reaching any given node’s 2 channels. TR2 now presents a degenerate sub-set of that NUMA config where 4 of the CCX’s (the OS on their behalf) cannot optimize their memory map to avoid adjacent node hops over the fabric for memory access.

Again, we don’t know how bad this is until we see the chip in the wild and even then, its impact will be wildly uneven depending on application, but… It’s a problematic approach who’s impact is likely felt long before you are “memory bandwidth bound”.

"Just buy Epyc"TM doesn’t address the top-end clock issue. Epyc’s are (for good reason) lower clocked parts.

thro · June 26, 2018, 1:18am

True, but the question is not necessarily “how much of a problem is it” so much as “how does this chip for $X compare to competitor chip for $X”.

Its definitely a compromise, but if it means you get 32 cores for the cost of say, 16-18 or less from intel (and they outperform inte’s 16-18 core), it’s a win. Even if in some/significant number of cases those cores are memory starved.

thro · June 26, 2018, 9:03am

More intel news…

Disabling HT on intel will surely hurt that “IPC advantage”…

cekim · June 26, 2018, 5:44pm

That and real-time ML (used in this exploit) is a Pandora’s Box for exploits well beyond prior specter/meltdown.

This further cements my assertion that we need to start thinking very differently about computer security. The idea that we are even converging on an inherently secure hardware solution is a fantasy that is getting more distant by the day.

This means re-thinking network structure, hardware as a service, cloud, etc… and the much harder question of addressing the human factor (the people doing the exploiting). As it stands there is less than zero consequence to the overwhelming amount of malicious activity on the net (less than zero because some/much of it is state sponsored).

This level of exploit and the performance problem(s) of stopping it and things like it demand partitioning of secure and insecure hardware and stamping out the malicious people and organizations exploiting bugs.

Crypto keys are like door locks. They keep honest people honest, but if you think they make your network or files impenetrable, you are fooling yourself.

risk · June 26, 2018, 11:36pm

So in order of relative vrm quality for 32core/tr2 overclocking, how would you rank the popular x399 boards (e.g. taichi vs zenith extreme vs carbon ac)?

MazeFrame · June 27, 2018, 12:26am

Wait until specs are released, then look at their respective datasheets?

Dje4321 · June 27, 2018, 1:35am

atleast 10-12 phases but 16 is highly recommended. Beyond that i would wait for x399R2 motherboards to be released

thro · June 27, 2018, 1:36am

And as always, they’re only as secure as who holds the keys.

Crypto is all well and good but if someone you don’t necessarily trust holds the keys, you’re boned.

cekim · June 27, 2018, 1:36am

That and root-trust/Certificate Authority is ultimately a means of mass DoS…

SoomTM (not me, I didn’t do it…)

thro · June 27, 2018, 1:39am

Yup. The whole root-CA thing is a house of cards and i’m sure that the US government has probably infiltrated a number of them and can issue certificates to impersonate whoever they like if you trust the root CAs.

Dje4321 · June 27, 2018, 1:41am

@thro @cekim this thread is for AMDs 32 core vs Intels 28 core. if you want to talk about the new HT exploit then feel free to start a thread about it

thro · June 27, 2018, 1:45am

Fair enough. Figured it was relevant as so far it looks like AMD is unaffected, and disabling HT will have a massive impact to IPC, which is one of the advantages intel has over AMD.

i.e., it is relevant to the performance comparison of the CPUs being discussed. if you turn off HT on intel’s 28 core, it is only 28 threads instead of 64 with AMD…

Dje4321 · June 27, 2018, 3:09am

Your good. Disabling HT will be enough of a performance hit that it will remain on by default plus the vulnerability wont really be effecting mainstream users

Just trying to keep the thread on track