Quickie Rise of the Tomb Raider DX12 testing on Ryzen vs Kaby Lake

Darkrage · April 2, 2017, 7:26am

So basically:

GeoThermal Valley - Ryzen 86.8fps, KL 88.5fps, difference (88.5-86.8) /86.8 *100 = 1.95%

Prophet's Tomb - Ryzen 90.6fps, KL92.8fps, difference (92.8-90.6) /90.6 *100 = 2.42%

SpineOfTheMountain - Ryzen 121.3fps, KL 126.1fps, difference (126.1-121.3) /121.3 *100 = 3.95%

Average performance delta (1.95 + 2.42 + 3.95) /3 = 2.77% in favor of Kaby Lake

Ryzen 4100MHz, Kaby Lake 4500MHz, difference (4500-4100) /4100 *100 = 9.75%

In other words to normalize for IPC we should increase Ryzen fps with 9.75% (or decrease KLs)

GVT - Ryzen 86.8 + 9.75% = 95.26 vs KL 88.5, difference 7.29%

PT - Ryzen 90.6 + 9.75% = 99.4 vs KL 92.8, difference 7.11%

SotM - Ryzen 121.3 + 9.75% = 133.1 vs KL 126.1, difference 5.55%

Average delta (7.29 + 7.11 + 5.55) /3 = 6.65%

Ryzen 7 1800X clock for clock has 6.65% more IPC than 7700k

Maths

MisteryAngel · April 2, 2017, 4:16pm

That doesnt really sound entirely correct to me.
If you look at it from an entire picture.
One game doesnt of course tell the whole story.
Kabylake has higher per core performance then Ryzen.
If you do a single threaded test like Cinebench R15 single thread,
then the Kabylake scores better significantlly.

wendell · April 2, 2017, 4:23pm

It's more complicated than that. Kaby can do one complex and up to two simple instructions per clock.. Ryzen can do two complex instructions per clock .

It depends on the workload

MisteryAngel · April 2, 2017, 4:56pm

Yes it does of course depends on the specific workload indeed.
But wenn it comes to gaming, in most scenario´s Kabylake is still ahead.
Higher memory clocks for Ryzen, does close the gap a bit in certain titles.

But yeah in my opinion Ryzen 7 cpu´s arent really to be concidered gaming cpu´s all that much.
There for the Ryzen 5 1600X and 1600, will be way more appealing i think.
Because in terms of pricing, if the rumored prices of arround $249,- msrp are true.
It will be a nice debate between that an a quadcore i5.
On the otherhand the 1700 costs arround just as much as the 7700K atm.
So if you do more then gaming, then the 1700 will be a very good value for money.
Of course the overall gaming experiance on Ryzen is just fine really.

wendell · April 2, 2017, 5:21pm

Yeah, been doing a lot of fair testing the last few weeks on i7 7700k at 4.5 oc vs 4.1 oc on Ryzen and the 7700k and Ryzen are within a few % of each other mostly. Slight edge to 7700k

MisteryAngel · April 2, 2017, 5:29pm

Yeah the memory overclocks on Ryzen really seems helping.
I´m exally currious if you pic the game with the biggest gab between the two,
And downclock the 7700K to 4.1GHz aswell.
To see if that changes it a bit.

I´m really looking forward to see the R5´s comming out aswell.
To see how those stack up against an i5, and R7 series in gaming.
Currious if the 3+3 ccx design on the 1600 series are going to have any impact on gaming performance,
vs the 4+4 ccx design on the R7 series.

Zorak · April 2, 2017, 5:51pm

You left the number of cores out of the equation, though. I doubt the game uses twice as many cores in the Ryzen case, but it's not clear that it's limited to 4 in both cases either.

Darkrage · April 2, 2017, 9:06pm

Nobody provided CPU usage statistics, that's why I left the cores out. In any case my math is correct regarding actual FPS output from both CPUs and assuming Ryzen scaling is linear which it isn't. At 4.5 Ryzen will be far better than KL.

@wendell I'd really like a 7700k vs Ryzen performance percentage delta in games as well as in other workloads. We should be able to come up with a performance grid of sorts that has game A independent or backed by company B on processors X and Y with performance delta Z. This would be the definitive consumer guide.

@MisteryAngel - well, the % are correct mathematically for this particular game

TBH Ryzen is the better CPU to have regardless. And the platform is better also. Check this as well: https://www.youtube.com/watch?v=88IOcFE4yho

gtbtk · April 28, 2017, 12:47am

The game play with GTA V has two noticeable things going on.

Ryzen frame rates in general, are lower than the framerates on Kany Lake.
The extreme deviations from the average framerate is generally higher on an intel platform than it is on a Ryzen system.

The reason for item 1 is that memory latency on a Ryzen system is almost double that of a dual channel Intel system (Kaby Lake) running the same ram. The end result is that the Ryzen CPU cores are having to wait slightly longer to get the mission critical data from memory before it can process the next frame and send it off to the GPU. New firmware releases and experiments with using lower memory straps combined with higher REFCLK settings have shown that the gaming type performance improves because you can access the tighter timings being used at the lower strap.

That is all down to AMD setting tighter secondary memory timings at the 2666 strap compared to much looser settings at the 3200 strap. Currently the secondary timings on AMD systems are not user adjustable and is why people are having headaches getting 3200Mhz Ram working.

Running memory at 3200mhz c14 by using the 2666 memory strap and then setting REFCLK to 120Mhz will perform better than running 3600Mhz memory using the standard 3200 strap with c14 timings and setting refclk to 112.5Mhz to push the memory to 3600Mhz. The 3200 Machine will get a 9000 CPU score in Time spy compared to an 8000 score with the 3600Mhz and a 7500 score using 100 Refclk with 3200Mhz ram. It is also the reason why CPU utilization on the Ryzen PCs running the game seems to be lower, the processor is being starved of data from the memory sub system. By using the lower memory straps, latency drops from 85ns down to 70ns.

That is still nowhere near as good as the 45ns that a kabylake system can do and why Ryzen his still trailing behind. The GPUs are now powerful enough to reach the limits of the Ryzen systems capacity to process and feed instructions to the gpu at the same rate as Intel can with the current level of bios refinement that exists with Ryzen. This is also the reason why the R5 1600 chips have basically the same gaming perfromance as the R7 chips and 1080Ti on Ryzen can be out performed by a 1080 on Intel machines at times. The latency causes what is in effect, a performance ceiling

All the other things that the media have blamed for the poor gaming performance with a high end GPU such as CCX thread switching and SMT are only symptomatic of the high latency issue

The second issue is the extreme drops in frame rate that can be observed on intel platforms are what causes the game to appear to stutter. It would appear that the CPU and memory can process data so quickly that the drivers cannot keep up.

Nvidia, with their DX11 driver support actually does something quite clever. DX11 rules puts draw calls on a single thread and while DX11 can be multithreaded, most games tend to also use that thread for the other game logic that is also used by the driver. Nvidia's DX11 driver uses a software scheduler to take a game's single threaded DX11 draw calls and game logic and separates the game logic components to run across multiple cores. This incurs a higher CPU overhead hit across all cores but oftentimes results in improved performance due to not running into a single threaded bottleneck.

I suspect, and I am only making a guess here, that the stuttering could be caused by the CPU processing data at a rate on Intel, that exceeds the Drivers ability, periodically, to intercept the thread and split it up in a timely manner. The Ryzen systems, because they are being held back to start with, cannot generate the same loads to outstrip the driver keeping things at a more steady tempo.

Wendell has already tested the USB latency and found that there is no impact on performance.

As a side note, AMD GCN architecture uses a hardware scheduler and cannot take a game's single threaded DX11 game logic and split them across multiple cores. The result is that in games that heavily place game logic + draw calls on a single thread, AMD performance will suffers while Nvidia benefits from the multi threading benefits.

Assuming of course that the CPU can get enough data through the memory subsystem, Games that make use of the multi threaded features of DX11 and DX12/Vulkan that are already multi threaded, results in the possibility of AMD performance improving much more than Nvidia's performance with similar level GPUs like 480 vs 1060 due to Nvidia's software scheduler incurring a CPU overhead hit across multiple cores to split draw calls.

wendell · April 28, 2017, 10:53am

A lot here to get into. But I put my hands on a system that had a memory latency of ~~60ns and it was swanky. Hallock said there had been 14 revisions to uefis and agesa and that the next major update is close to release.

Their priority was compatibility first then everything else.

gtbtk · April 28, 2017, 1:31pm

It would behoove them to actually come out and say that publicly in the beginning to better manage expectations. They could even said that the stability settings on day one were to allow for the widest range of memory to be compatible and new bios releases will provide tighter performance with lower latencies rather than just hope no-one will notice.

did you manage to see what timings they were running on the ram?

The guys at overclock.net have 3200 c14-13-13-13-34-1T running on CH6 with 100mhz refclk - 62ns latency

heading in the right direction. the new bios next month

wendell · April 28, 2017, 2:05pm

yeah, thats about right. and there are entire separate uefis for 2T ram instead of "lets slow it down so its "1T" " and that was interesting. They didn't seem bashful about saying that if you want a set-it-and-forget-it, get samsung b-die.

Also, I understand some of the settings in uefis and why some mobos still have trouble with samsung bdie at more than 2666. Will be doing a video on that for how to hunt for settings that work on those boards until the software that does it is finished. All the pieces are there.

gtbtk · April 29, 2017, 12:13am

I know that Asus have put out some 2T CH6 uefi versions. I don't know about Asrock, giga or MSI.

Any news on any expansion of the number of supported ram straps?

looming-hawk · April 29, 2017, 1:56am

any idea when we can expect c10 at 3000 for ryzen when using all 4 slots with dual rank memory? or if Naples will have better support for tighter timings at a higher clock than current ryzen systems. i have plans to build a dual socket Naples system with the tightest timings possible in a custom fab case. ( i will provide my cad files and pics for those who want them after it is built.) sadly ryzen does not have enough PCIE lanes for my future plans which is why i am looking towards naples.

glenmartinez · April 29, 2017, 8:59am

GeoThermal Valley:
Min FPS: 68.5
Max FPS: 109.3
Average FPS: 86.8

Pholostan · April 29, 2017, 10:12am

I would say probably never. It doesn't look like Ryzen will be able to run that kind of low latency at that speed. Especially not with dual ranked memory. Single rank maybe, but I would probably wait for Zen2 and Zen3 for that. Really tight timings probably takes some refinement on the IMC and motherboard platform.

Naples is a server platform, that means ECC RAM and probably registered too. look at the typical speeds ECC RAM come in. There are some new 2666MHz ECC REG modules showing up now, at CL17. That Cas latency is also typical for the more easy to get 2400MHz ECC DDR4. Ordinary DDR4 might very well work with Naples, but low latencies? Probably not as it will be certified for ECC REG in the first place.

But we'll see. AMD could do something interesting workstation-wise with their big socket. I'm much more interested in Naples as a workstation implementation than I'm interested in Ryzen.

gtbtk · April 29, 2017, 10:56pm

Things are improving. Latency is down to 62.5ns as the frequency support are improving and timings are being tightened up. I know that AMD launched Ryzen targeting stability over performance, Unfortunately it has taken them until just recently before they started communicating things like that to the enthusiast tech community. Some of whom know as much or more about the theoretical aspects of CPU tuning and inter-connectivity as AMD do. While AMD published fuzzy descriptions of the architecture, The thing lacking has been detailed information on how the chips actually work at a physical level. That information has come from the marketing information combined with a whole lot of revers engineering on the part of the community.

Remember that Ryzen memory controller already supports ECC memory, at least partialy. Just not officially. The entire Zen family of cpus are designed to be modular and centered around the infinity fabric that actually has loads of bandwidth.

Want a 16 core chip? Add an extra 2 ccx modules. 32 core CPU? add an additional 6 ccx modules

Want 8 channel memory controllers? add 3 extra memory controller modules with additional interconnects to the Fabric.

The challenges that AMD have been dealing with is not the Infinity fabric itself, it is the separate interconnects that connect the various controllers up to the fabric. Each interconnect is limited to 32bytes per cycle and the traffic between modules needs to be managed/orchestrated.

Benefits of the approach is the ability to release new chips with different core layouts without having to spend all their money on R&D to develop the new chip.

Cons to the approach is that you will forever be fixed at the max 4Ghz single core performance that we see in all the Ryzen range, at least until you create CCX version 2 that can clock faster. The other con is the additional compromises that you have to make whenever you have a modular design, you cant always have a "big pipe" where you may like it, you have to work around it with a number of smaller pipes that match up with the available connectors and then manage the traffic/timings.

Given that servers do not tend to be used with high performance GPUs for gaming, Memory latency is not quite as important in most server loads, even in latency effected tasks compression, where it does impact performance, it is not that bad now and improving so there is no reason why these chips cannot succeed in the server space.

The bigger the socket, potentially the faster the chips will be able to run as heat loads can be spread over a larger area. That gives AMD the option of server chips with ryzen like single core performance or higher density chips with lower clocks and many many cores.

backbone · May 1, 2017, 9:18am

Running the benchmark on a 1700X @4ghz with 32gb of ram 4dimms at 2666mhz and 2 Fury-X in DX12 multi-gpu nets me:

1080p
Mountain Peaks: 202.61av 87.93min 275.76max
Syria:105.36av 6.42min 167.79max (dragged down by a 6fps min when the camera is moving past laura and focusing on temple)
Geothermal Valley: 128.03av 79.01min 157.87max

1440p
Mountain Peaks:164.55av 91.60min 218.43max
Syria: 99.05av 6.21min 150.62max
Geothermal Valley: 128.01av 91.11min 160.09max

both of the fury-x cards are at stock clocks with a bump to power allowance.

gtbtk · May 2, 2017, 12:25pm

Those results look really quite good, At least when compared to the only thing I could find online to compare it to running on a 6700K here

What memory timings are you useing, what model kit is it and what is the latency of your memory@2666?

Have you tried running in single GPU mode? how this the performance there? Have you tried timespy? what CPU score can your rig manage?

backbone · May 6, 2017, 9:59am

I'm using gskill flare-x modules. This is the newegg link.

at the time I was using the asus v1.02 bios and could only get 2666mhz with 16 16 16 32 timings. Since then asus released a version 1.07 bios and I can now get 2933mhz but I had to set 18 18 18 36 timings to get there will all four dimm modules.
With only two modules you can set 13 13 13 26 and hit 3200mhz easily.
Since you asked I ran timespy to see what it would do. I had just updated my graphics drivers and I guess 3dmark doesn't approve of AMD's latest but none the less this was the result.

Seemed to be decent but I didn't try overclocking the cards just bumped up the power allowance. The latency test I used was from userbenchmark.com it's a simple graph showing the time for progressively larger reads. Here is that result.