Seems Threadripper and Epyc are based on the same chip

gtbtk · July 27, 2017, 3:48pm

interesting vid delidding a threadripper chip.

It does not seem logical only using two of the dies. using a ccx module from each die would give huge benefits in heat dissipation without hurting performance

anon5644329 · July 27, 2017, 3:54pm

You are second actually … but nice try

just 10 min time difference

gtbtk · July 27, 2017, 3:58pm

That would be because that post is not in the cpu section

gtbtk · July 27, 2017, 4:00pm

I just saw that and corrected my post

sparky · August 1, 2017, 2:25am

If this hasn’t been corrected anywhere then it looks like it is 2 die for the functioning cores and two die that either have not gone through the manufacturing process or are known dead die perhaps. They are doing this for package rigidity since just having a void where two of the die should be would make the ihs flex in those areas. That’s not a good thing. So they are just using some silicon as a shim.

I believe I read that at Anandtech.

gtbtk · August 1, 2017, 6:11pm

1950x thread ripper will use 4 ccx modules with 4 cores each, 2 dual channel memory controllers and 2 PCIE/IO controllers that can potentially be located anywhere on the 4 dies.

There is no penalty from a latency perspective for any component to communicate with any other component over the Infinity fabric regardless of which die it is resident on and is the key feature of their infinity fabric. Epyc only sees a penalty in the dual socket chips if threads have to switch between sockets.

I would think that from the perspective of distributing heat over a wider area, it makes sense if each die supports a CCX module with two dies hosting the two memory controllers and two dies supporting the pcie controllers. That gives them flexibility to put the memory controllers in the dies closest to the memory slots to improve latency but it also give them flexibility to move everything around and use two complete R7 dies if they want to.

The “shim” story is only that, a fairy story.

AMD are doing the same thing with Epyc/Threadripper that they have done with R7, R5 and R3. The dies with dead/faulty sub components are used to produce the lower level models. The die may be otherwise perfect with the exception of a single defect in one core of a CCX, You dont throw that way, you disable that faulty CCX module and retask the die in a lower end SKU where that CCX module is not needed

Remember Silicon by itself is expensive, If it truly was being used only as a shim for support, AMD has plenty of off cuts from the edges of the wafers that they could have recycled and used instead of sacrificing silicon that could have been used for additional chip manufacturing.

Threadripper gives them a range of SKUs with much higher profit margin than any R5 and R3 chips while they are using basically the same rectangles of silicon.

Kevadu · August 2, 2017, 3:22am

AMD has officially stated that the other two ‘dies’ are not dies at all. They’re placeholders purely for support. They even went on to state that the active ones are always diagonally across from each other, which makes sense from a thermal perspective (they’re more spread out that way).

Dies with faulty cores can still be used for the 12 and 8 core versions of TR. There is really no reason to have two extra dies to make 16 core version unless your failure rate is abnormally high (And that does not appear to be the case based on standard Ryzen. 8 core versions are widely available.).

I suppose AMD could be lying about all of this, but why? Until somebody can provide actual evidence to the contrary I think the safest bet is that the dummy dies are there because they are using the same basic package in Epyc.

anon5205053 · August 2, 2017, 3:51am

i personally wouldn’t be surprised if epyc cpu’s to work just right on x399 platform just like xeon’s worked just fine on x55 ,x79, x99 etc platforms… (as in some of the funny cases xeon’s were cheaper than desktop parts, while being same cpu’s with overclocking multiplier blocked) – not to mention amd thread rip offers much higher clocks vs epyc.

gtbtk · August 2, 2017, 4:03am

I know they said that. They were lying. They just don’t want to broadcast and make it easy for the competition by broadcasting the real details of how their architecture works. Much better to be able to discredit them when they say things like “glued together” CPUs

You are not thinking about this terribly deeply are you? Why would anyone use material that they could otherwise use in making more product that generates profit when they have a huge pile chips that would not otherwise be usable?

The way Zen works, as long as everything is on the same package, there is absolutely no disadvantage to having the elements on the chip on different dies. Why would you use your A grade perfect 8 core dies on a threadripper that will sell for $1000 when you could use it on a 32 core Epyc CPU that you can sell for $4000?

Particularly if you have a ready supply of lower grade dies that didn’t pass the full 8 core test and your architecture gives you the flexibility to put the lower grade dies on any of the four dies positions on the chip without penalty to performance while letting you sell those in a $550 -$1000 product instead of a $100 R3.

gtbtk · August 2, 2017, 4:10am

As far as I am aware, the outside pins are the same. The PCIe pads in the squares in the middle of the socket are different.

At least that is the case with the Asus Zenith board.

I doubt it would work given all the extra PCIe lanes that Epyc chips have

anon5205053 · August 2, 2017, 4:11am

well then they are different after all

TheCaveman · August 2, 2017, 4:16am

Yeah because they totally don’t know that Reviewers like PcPer will do core-to-core latency tests that would show the differences in latency if they were actually using 4 ccxs with 4 cores working, or if they were (and this is definitely what they are actually doing) using 2 ccx’s of 8 working cores.

If they were lying, they’d be HAMMERED by the reviewers. They know that, and this isn’t something worth lying about.

Kevadu · August 2, 2017, 5:07am

I’m not thinking about this terribly deeply? You are the only claiming that they are shipping every Threadripper processor with twice as many cores as their highest end part uses (and four times as many as their low end part) in order to save material costs. Clearly I was being too polite in my earlier reply: That is lunatic fringe nonsense.

Yes, there will be some dies where some cores are bad. But nowhere near that often, unless they were having some sort of terrible manufacturing problems. Which there is zero indication of in standard Ryzen, and they’re the same dies. It is not normal for your highest end part to have anything besides the theoretical maximum number of cores. Salvaging dies with bad cores on them is the reason not all parts will be sold as 16 core parts. What you are suggesting is massive overkill, and a massive waste of resources.

And since you seem to think blank silicon is prohibitively expensive (which it isn’t) I suggest you read this:
The real cost of making chips

gtbtk · August 2, 2017, 6:54am

While there is a difference in switching threads witin the same ccx compared to switching to another CCX. There is absolutely no penalty or difference switching threads from a core on one ccx to the other on the same die and a ccx on a die on the other side of the fabric.

That is what makes the Fabric so revolutionary. Those silicon rectangles are only like a scaffolding that hold the components in place on the fabric to provide a fixed address. There is no test that reviewers can do that can determine where on the package two different CCX modules are located by measuring timings. But to what end? It makes absolutely no difference to the user.

There is no increase in thread switching latency until you get to a dual socket motherboard and threads are switching between different sockets.

Servethehome.com have already measured dual socket epyc and show that.

AMD are only trying to stop the “can I enable the other 16 core” speculation

TheCaveman · August 2, 2017, 6:59am

You can totally measure the latency, as PcPer did, to see how many ccxs are being used. If this was a 4 ccx cpu (ala Epyc) there would be multiple more of those “X” shaped crosses in the graph. Using the simple program PcPer created, the number of active CCX units can be tested.

https://www.pcper.com/reviews/Processors/Ryzen-Memory-Latencys-Impact-Weak-1080p-Gaming

gtbtk · August 2, 2017, 7:31am

your last reply repeated back half of what I had just said in your last reply.

What do you mean there is zero indication in standard Ryzen?

Firstly, There is no such thing as a 100% yield. you bin the chips that you have manufactured and sort them into different piles based on how much of the chip works and what perfomance levels the chip supports then use the top tier dies to make 1800X, the chips that all work but have not binned as well, make 1700x and then 1700 Chips. If one or 2 cores are faulty, you laser off anything that is surplus to requirements and make R5 1600 chips and if an entire CCX is not working you make the quad core chips wither with or without SMT.

The chips that work except for a dead PCIe or memory controller ate stockpiled until you launch a product like thread ripper that uses 4 dies and can mix and match CCX, memory and PCIe controller location anywhere you want to have them on the package

Secondly, The other evidence in the single die products is the existence of R5 and R3 SKUs

Third, If you are selling a 4 core R5 or R3 CPU, what difference does it make if the silicon on the package is holding an extra 4 cores that are inactive. It does not mean that the active 4 cores are faulty, The fault could be as minor as a small group of transistors in a key location to effect an entire CCX module. The rest of the silicon on the die may be perfect.

You realize that when you mass produce things, a component that costs an extra $1 on your BOM has just cost you an extra $1million if you manufacture a million chips. That either reduces your profits and displeases the shareholders to whom you are beholden as the CEO or ends up increasing the cost of the product by many dollars on the shop shelf reducing the number of sales as less and less people can afford to buy it if you are selling $1000 CPUs .

epicbastion · August 2, 2017, 7:46am

I would like to point out a couple of things, neither of you are wrong. Except for saying AMD is lying. As production stands now there are a number of ways to get to 16 cores for threadripper, but, AMD is just ramping up production, what are they going to be doing, the most efficient possible way of getting the most working full spec’ed cpu’s out the freaking door as fast as possible. So, AMD has stated the 2 dies are “dummies”, all that means that at this point in time they are putting 2 full spec dies on a threadripper mcu for best possible permformance, the 2 “dummies” could be anything from a die that only has 3 working cores to scrap that has been recut to fit the space. There is also the theory that they are the outer edge dies of the wafer and that they are of the least quality and that they are in fact connected to the fabric as passthroughs for power and comms only, completely bypassing any cores or cache or controllers.

The way a reviewer would be able to tell which ccx’s are being used is not throught latency though, its heat. If they put 4 dies each with 4 cores operational, the heat spreader would show the glow in all 4 spaces where those working ccx’s are so no AMD can not say they are only using 2 dies and then use all 4, at this time. Now that does not mean that as the product develops and demand increases they are unable to start mixing and matching ccx’s and cores and cache. so on the 8 core they could have any of the 4 dies with a 4 core unit for the 12 they could use 3 separate dies with 4 cores or one 8 core die and one 4 core or all 4 dies with 3 cores, there absolutely will be no penalty inside the mcu for mixing and matching in the future, for when they say want to drop a 10, 14, 18, 20, 22 and 24 product stack into the mix, look at the numbering, 1900 -8 core, 1920 12 core 1950 16, so lets see, 1910 - 10 core 1940 14 core 1960 18 core 1970 20 core 1980 22 core 1990 24 core or some such thing. They could also skip any of those numbers an use 5’s or whatever they want. Intel thinks their 18 core is going to matter when it shows up in december or january, Threadripper will be introducing a 24 core in december.

From what I understand the pcie, power and other comms all go through the same process no matter which ccx’s are live so they can activate and utilitze up to 32 cores on a threadripper mcu just as easy as an epyc. They will just have 64pcie lanes run through the socket and managed internally, this is after all an SOC so they can change the microcode to tell the mcu to do what ever they want.

gtbtk · August 2, 2017, 8:09am

that graph is only showing the difference switching between cores on one CCX and switching to cores on different CCX. Given that it is from a Ryzen there is only a single die on the package to test.

That was not what I said. I said it makes no difference switching threads to a different ccx on the same die or to a ccx on a different die. You are making the same mistake Intel made by saying it was “glued together”.

Zen is not the same as Intel. The rectangle of silicon you see on a delidded package is not a monolithic self contained chip. The silicon only holds the CCX and the controllers and allows them to have a fixed address on the underlying fabric. The Zen chip is limited by the size of the infinity fabric which in turn is limited by the size of the socket you use.

Think of Zen architecture sort of like a dining table with a tablecloth, smart plates, knives and forks that can all communicate with each other through the material of the tablecloth. Knife A has to go through the same table cloth to communicate with Plate A as it has to go through to communicate with Plate D or E or fork F.

End result is that it doesn’t matter which table setting the different elements are located at, it always goes through the tablecloth. The only exception is if one part of Plate A with the meat wants to communicate with another part of plate A with the vegetables.

The size of how big your tablecloth is and how many plates, knives and forks you can set out are only determined by how large you can lay out the tablecloth which is limited by the underlying table.

Of course you can also lay out a single placement and only have a single plate, knife and fork which is analogous with what the Ryzen chips are.

I know that it sounds cartoonish but that is exactly how zen is connected together any core has to use the fabric to communicate with the memory or the pcie controller and what makes it so clever. It means that if you want a 48 or 64 core chip, As long as the number of cores is divisible by 4, you just make the fabric bigger (or make the dies smaller) and attach more dies that hold the extra cores and controllers. Having the fabric as the star of the show makes engineering new model chips much cheaper.

Intel are trying to do a similar thing to modularize their chips with the mesh they have used on skylakeX but they have still limited themselves by making the block of silicon the star and sticking the mesh on underneath almost as an afterthought.

Kevadu · August 2, 2017, 8:32am

OK, I’ll explain it a third time…

8 core Ryzens are widely available. They’re not overpriced (yes they are higher end parts so they cost more but there’s no particular jump in price), they’re selling very well, and they’re not out of stock anywhere. There is absolutely no indication that AMD is having any trouble making 8 core dies.

Two 8 core dies gives you 16 cores. Yet you are claiming they need 4 dies to reliably get to 16 cores. That would only be the case if their manufacturing was completely screwed up and yields were terrible. And if that were the case then 8 core Ryzens would either be hard to find or extremely overpriced. It’s the same die so you can’t have it both ways here. Either they can reliably produce an 8 core die or they can’t.

Yes, of course not all dies are 8 cores. Nobody ever said they were. But Threadripper comes in 8 and 12 core versions too. Do you think they would need 4 dies (with a theoretical possible 32 cores) to make an 8 core version? Two core versions of standard single die Ryzen don’t even exist.

gtbtk · August 2, 2017, 8:40am

The only thing that is new is the substrate, socket and the size of the fabric. The dies are the same as the Ryzen ones and they have been making those for months now.

The conversation about dummy dies or not is really rather pointless and is only a topic for discussion because a german guy delidded a soldered on chip and got a surprise when he saw four rectangles of silicon under the spreader. it makes absolutely no difference to the end user experience. It does, however waste AMD resources having to spend time discussing why there are four dies there or dealing with lost sales from any successful hack or returns from idiots who have tried to hack the chip to enable 32 cores and broken it.

I mentioned measuring temp as well but without delidding the chip, finding instruments sensitive and fast enough to measure the variations will be a challenge. I guess that it will happen if many people stay hot and bothered over something that is really a non event.

I know that they can disable elements, just as Nvidia do on the GPUs but I do not know what process AMD are using to disable the redundant elements on these chips. I assume that they laser the fabric interconnect points for the relevant elements they are disabling to make it permanent. I doubt that they would just be using microcode. At least in the first release when everyone and his dog is trying to find ways to hack it and re-enable the cores.

The architecture does make it trivial to increase core count up to 32 cores for new SKUs.