welll… its slightly more insane than that. The limitations that we’re up against at this point are basically speed of light problems. If we make a compute core smaller, we can clock it way up because we don’t have to wait for signals to propagate from one end to the other. If arm adds a lot of instructions/compleixty, upclocking it will be hard just because of the length of the wire and the transistor settle time.
Similarly, intel can’t add too much more complexity without things getting weird. So we have things like avx-512 now which really does speed things up dramatically but for all intents and purposes it is a totally separate isolated independently clocked compute unit.
Arm could do something like that. Probably we’re entering an age of heterogeneous silicon simply because there is not much more low hanging fruit. We’ll see 22nm memory and 14nm compute cores and 7nm interconnects probably all mixed together. Possibly we might see some avx-512 type computation actually on dram in the near future? but probably nothing fast since that’ll take a special compiler.
I am blown away by how much x86 compatibility there is, but how physically small the ryzen compute die is. I would love to know what’s going on there. AMD has pulled a rabbit out of a hat. And remember from the pcper benchmarks the corner-to-corner die latency was measurably worse than adjacent ccxes… and much worse than ccx-to-ccx latencies. Intel’s still got this dual ring bus in some cpus but they’re moving to the same silicon.
So if arm scales x86 by having an x86 translation core? Then it could be an almost overnight thing. Because I’m betting inside zen they break down at least a good portion of x86 instructions into sequences of micro-ops that make more sense in a modern world. Arm can do the same thing and it’s basically “only” a pipeline modification. Transmeta published some nifty papers on this way back in the day.
Someone I think took pictures of the sandy bridge silicon and overlaid that on cpus up through kaby lake. and it was the same looking in the photo compute core surrounded by more and more silicon for specialized functions added in haswell, skylake, etc. It was scary how little had changed and how much was just “added on” as specialized compute cores.
hbm cpus is probably the next thing with dram used as a sort of secondary ram cache I bet. Or possibly a new form factor where you inject hbm somehow into a ceramic cpu carrier manually. That’d be neat. I should patent some obvious ways of doing that… think of an old school looking cpu like ta pentium pro or threadripper with “micro sd” looking slots for hbm on them. lol.