Last PCPer podcast had some interesting Ryzen discussion:
(Pay no attention to the man with the console)
Ryan talks a bit about his thing on 1700 overclocking at https://youtu.be/4aEw3e-je9w?t=6m22s
Then at https://youtu.be/4aEw3e-je9w?t=35m23s they start get on with more Ryzen stuffs. Alos talk about XFR (an often misunderstood feature), that Biostar mini-ITX and a bit about Naples.
Some interesting bits, Josh Walrath notes there about the difference in production and gaming benches. I think he is on to something about it being a limitation in the infinity fabric connect between the CCX:es. Other have written about it:
Explains about the fabric the CCX:es lives on.
The Data Fabric is reponsible for the core’s communication with the memory controller, and more importantly, inter-CCX communication. As previously explained, AMD’s Ryzen is built in modular blocks called CCX’s, each containing four cores and its own bank of L3 cache. An 8 core chip like Ryzen contains two of these. In order for CCX to CCX communication to take place, such as when a core from CCX 0 attempts to access data in the L3 cache of CCX 1, it has to do so through the Data Fabric. Assuming a standard 2667MT/s DDR4 kit, the Data Fabric has a bandwidth of 41.6GB/s in a single direction, or 83.2GB/s when transfering in both directions. This bandwidth has to be shared between both inter-CCX communication, and DRAM access, quickly creating data contention whenever a lot of data is being transfered from CCX to CCX at the same time as reading or writing to and from memory.
Edit: Not sure that his numbers are right, AMD has specified that the Fabric runs at half memory speed. I guess it is possible to run RAM at 3200 though.
Witch takes us to the point:
Memory Scaling
To put things into perspective, communication to and from cores to L3 cache inside the same CCX happens at around 200GB/s. Combine this with the massive difference in latency you would expect from having to go through extra hoops to reach the other CCX, and the communication speed between CCX’s is simply not anywhere near intra-CCX communication speeds. This is why changing the Windows scheduler to keep data from a single thread in the same CCX is so important, as otherwise you incur a performance penalty, as observed in gaming performance tests.
As observed in multiple tests, Ryzen appears to scale quite well with memory speeds, and this knowledge sheds light on why. It’s not necessarily the increased speeds with the DDR4 kits themselves, though it certainly helps. Rather, it’s the increased internal bandwidth for inter-CCX communication, which alleviates some of the performance issues when threads have to communicate between CCX’s.
Due to this, if you’re picking up a Ryzen system, it’s highly recommended to get a decently fast memory kit, as it will help performance more than you would otherwise expect.
So I certainly think Josh Walrath is on the right track. Question is what can be done about it. We're still working out exactly how it works.
Allyn also touches on som oddities he saw when doing storage benches on NVMe drives. On single thread and high queue depths, Ryzen only has about half the maximum IOPS compared to Intel. This is a bit odd, but not something a desktop user has to care about as you never hit those maximum queue depths on desktop (16-32) [1]. Still it is an oddity that might be able to shine some light on specific performance, maybe it scales with memory speed too?
When talking about Naples Ryan knew that the the CPU sockets will talk over 64 lanes of PCIe using Infinity Fabric for the protocol. I hadn't heard that yet, interesting. Charlie over at Semiaccurate have written a short article on it:
I find this interesting, Ryzen might be a more complicated animal than we initially thought. Maybe the big thing will be to get memory as high as possible to get the infinity fabric running as fast as possible?
[1] Allyn ha talked about this before, how for desktop it is important to test at qeue depths of 1-2. Those really high IOPS number that are stated in the specs are almost always only at a QD of 32. Aka only for server loads, you'll never see this on a desktop. Typical BS marketing numbers.