16 core LGA Ryzen with quad memory channels...
Anon sources, but it sure has looked a bit bad.
I don't think it is bizarre at all.
It supports what I have been saying about the high fps load @ 1080p being bottlenecked by contention in the data fabric and not as a direct result of ccx thread switching and windows scheduling. the CCX anomalies are a symptom of the DF contention. If we are not already there yet, the SMT making games slower if u use a Titan XP or 1080TI @1080p should be close to vanishing too.
Higher memory speeds increase bandwidth in the data fabric and the bottleneck is alleviated. In any environment as latency increases, response time increases exponentially.
In a 1440p or 4K workload, the cpu has less memory read and writes and less frequent demands for more frame data from the GPU so the bandwidth loads on the fabric the connect memory and PCIe to the CPU never reached the point on the response timee curvre where is was starting to go vertical.
I like digital foundy's detailed benches...
Higher clock speed of ram intel vs amd. AMD performs better with faster memory. Intel is negligible at best. It is all about ccx for this scenario. So I wonder what AMD would do if it was using 4000 mhz of ram? Would it out perform intel at 1080p?
Also, anyone here have a wraith max cooler?
The problem is being exposed by titan level graphics cards putting so powerful a load especially at 1080p on the processor and memory and PCIe controller that it overloads the data fabric.
Different games all put different levels of loads on the computer. Explosions require many more calculations than plain blue sky for example. 150fps explosions that a Titan/1080 level card can produce many more calculations for the physical properties of the explosion together with the processing required to draw 150 frames in a second puts more load on computer rendering the same game at 80fps that you would get with a 1060 for example.
Data Fabric bandwidth is directly proportional to memory frequency. Data Fabric is basically a network between elements on the chip. Those functions include CCX thread switching, memory access, PCIe controller access and all require timely access over the network or performance is impacted.
Switching threads between CCX modules relies on the Data fabric being available on demand to switch the thread to the other CCX module.
When you overload a network you get contention for the available resources. Just like if you go to the bank to see a teller. If there are three tellers in the bank and you are first, you go straight in, do your business with teller number one and leave. Everything is quick and no contention. If you are customer number 4 in the queue, the first three get served quickly and you have to wait for the next available teller, slowing you down. The more people in the queue, the longer you have to wait.
If the DF is overloaded and the thread wants to switch when the DF is busy, it has to wait.
16 threads running at once with SMT all require more bandwidth to access the amount of memory that is required than if you only use 8 threads all accessing memory at once.
The Data Fabric clock runds at 1/2 the frequency of the memory frequency. The more clock cycles available to perform instructions and tasks will make what every you are using faster and will increase the point where contention becomes an issue. Faster memory provides the faster frequency to the Data Fabric. Basically the same as adding teller 4 and teller 5 to the bank counter means 5 peopla at a time dont have to wait in a queue..
Because of the reliance on the data fabric, the observed CCX thread switching, scheduling and SMT on and off performance differences are all symptoms of the underlying Data Fabric contention issues, not the cause of slow Ryzen frame rates with 1080TI at 1080p. You do not see the same issues with a 1060 or 1070.
The Data Fabric architecture is completely new and not seen before in a computer Processor. New creates different challenges to what went before but to address a challenge, you have to find it first. The majority of the Tech media/industry including it seems enginneers at AMD are trying to apply principles that they learned from using Intel or FX chips to this new architecture and have not yet worked out that the Intel principles are only 95% the same. They have gotten themselves stuck fixating on windows schedulers and thread switching. Those things are demonstrating anomalies but the anomalies are a symptom of something else, not the primary cause. The reason is not windows software it is the underlying framework that they are ignoring because it is in that last 5% that is not being considered. The tests have only been done using the high powered Nvidia cards. It could actually be being caused by something in the Nvidia drivers.
It is not possible to be the scheduler being the root of the problem because if it was, you would also see the same anomalies in Cinebench and other non gaming benchmarks. But there is no evidence of that. SMT would not be shown as being more efficient than Intel Hyperthreading in non gaming benchmarks
Faster memory is improving things. The way things are headed, together with faster memory, I have a feeling that an nvidia driver or a bios setting will be found that reduces Graphic card's contribution to the load on the Data Fabric to below the level where contention becomes an issue and the problem will be solved.
I still think there is something stinky in the IO but so far not much as cropped up. Lack the equipment and skills to just hammer the entire thing.
There are problems that need to be addressed. Not addressed yet and I don't know if it will ever be completely solved with the current silicon although I am hopeful.
You should chcek out these new GSkill ram kits for Ryzen
They have shown 4 dual rank 16GB dimms running at 3200 speeds at 14-14-14-34.
Pretty awesome stuff.
If that means to invest in a crosshair or similar priced board.... nope.
I am running X370 with 2666 and I don't really have any problem.
Some of the initial kits they have listed are being qualified on other boards though (not just the crosshair board):
(See the QVL Listing)
Hopefully that list expands while the list of actual Ryzen mobos grows.
I still think that 64Gb of memory at 3200 CL14 with 8 cores is pretty neat even if it requires specific hardware. Its good news for the platform really as I am sure other vendors will start releasing similar kits.
The MB will need to have a separate Clock Chip like the Asus CH6, asrock Tai Chi and Gaming professional, Gigabyte 7.
The lower end boards wont work at those speeds
Not happening because...
... and most of those are high end.
That is absolutely true. Without my dual xeon I might actually be tempted to go for a setup like that.
That may be true, but the link I posted to the 3200 kit has been qualified to work on a B350 gigabyte board at $100 bucks. The GA-AB350-GAMING 3 board is in the QVL list. I am sure other B350 boards will work with it also as they begin to test them. I think these kits are working at faster speeds because they are being properly binned for the AM4 platform, but I could be wrong about that.
I certainly hope that it works at the rated speed.
No other 4 dimm setup works at that speed at present, in fact, the existing QVL 2 dimm 3200 kits are still hit and miss to get going at 3200.
Things are certainly improving by the day but high speed stable memory across all motherboard skus is not there yet. One of the downsides that Ryzen bioses have that the Intel ones dont have is that for memory to be configured to work on an AM4 motherboard, it has to be tested and configured by AMD first and then AMD provide a code to the motherboard manufacturer for inclusion in the bios. With Intel bioses each manufacturer can validate their own memory settings.
Niice. Maybe there's yet hope for my Corsair 64 GB kit to some day reach 2666. ;-)