Someone asked a question about XMP crashing apps vs os…
That got me thinking - with all the signal integrity XMP/ECC nonsense going on, wouldn’t it make sense to transfer data over differential pairs?
Are there any hardware engineers around who could explain the latency / power tradeoffs?
I’m not one, but OTOH, that would probably require going serial to cut down on the wiring needed, but how much latency would that add? And could you design ram to “run over a wet string”?
Even though ram read/write today takes 10ns on ram side of things, from CPU perspective it’s more like 40-80-100ns (there’s multiple layers of caching and prefetching and TLBs so it really varies a lot in practice from the instruction perspective)
Naively, e.g. a cache line is 64 bytes (512bits) wide ; add another 4bytes (32bits for address) and another 32bit CRC ; and 32bit header. … let’s round it to 80bytes / 640bits per read/write command.
If we have 2 wires and 1GHz signaling rate, so is this 640ns ?
- What if multiple differential pairs are used, (e.g. 8pairs/16wires ?) 80ns?
- What if we did multi level signaling on each (possible because of differential pairs or am I wrong?), e.g. is 4 / 8 levels ok? Is more possible? 8 that brings it down to 10ns per packet (read/write).
- Is there something like QAM for these kinds of bitrates that can let us cheat here?
And what about tradeoffs in pin count (DDR4 is 260/288pins), PCBs carrying CPU silicon have tons of layers and pins, same as motherboards, all of which increases cost and complexity.
Can we have 32/64 channel memory, for our CPUs that have 128 logical CPUs in there, same as today we have dual channel/octo channel on epyc CPUs?
How could this work?
Would it even make sense?
What if we said, f***-it we’ll have 4KiB cache lines and 1-4GB of L3/L4 cache instead, and variable packet read/write length? Would that make lives easier, harder?