Should we consider revising how we benchmark GPUs?

I've been puzzled by the subjective experience of stutter in games when measurements were showing consistent framerates of 60+. The low not normally dipping below 50, so experience should be perfectly smooth, right?

Wrong. I worked out that the issue must be with the averages. It looks like frames that take unreasonably long to render can create visible stutter without showing up even on the Low FPS value due to the rest of the frames within the target second rendering at a high pace. If the delay between two frames is above 41.6 ms, effectively 24 FPS, you're viewing a slideshow there.

I've had a look into the issue and it turned out the issue was known and well investigated already in this excellent 2011 article: http://techreport.com/review/21516/inside-the-second-a-new-look-at-game-benchmarking

The guys at Techreport actually incorporated their findings into the way they write their GPU reports, which makes their website a very valuable source of additional info you won't find in other places.

This appears to be a particular bad issue with Crossfire and SLI setups, where there seems to be a tendency to throw out two frames in quick succession followed by a larger delay until the next two. Take it to the extreme and you're experiencing half the FPS you're meant to have.

To be perfectly clear, large gaps between individual frames can be a fault with not just the GPU alone, but a range of issues from CPU, storage, network performance and game engine itself. The metric should still allow comparison between individual parts changed within the same system.

Fraps lets you record individual frame timings when in benchmarking mode and this data had been very interesting to dig through. Here's the summary of my GTA V benchmark:

2015-12-16 15:47:07 - GTA5
Frames: 7529 - Time: 101625ms - Avg: 74.086 - Min: 47 - Max: 153

With a min of 47, I was expecting a perfectly smooth run, but it wasn't. Looking into the data, it looks like I had 27 frame changes where the effective rate was below 25. Here's some detailed breakdown of my run:

So at this point I want to call out the Tek guys to consider incorporating this metric into their hardware benchmarks, as I am sure I am not the only one who would reconsider hardware choices based on a GPUs ability to throw out consistent frame rates over a high average that hides bursts and stutter.

I also want to call out to the community to help me pin down how to make the best use of this data. As you can see, I was just poking around, trying to find insights and some of the data is redundant. What values are useful? What FPS ranges are meaningful? How can we quantify the consistency of performance? At what point (percentile) should we make the cut between expected performance and freakish outliers?

I can write a macro to automatically generate a table like this from the raw timing data and share it with the community, but it would be useful to know what's worth displaying and how first.

2 Likes

http://www.vortez.net/articles_pages/frame_time_analysis,1.html

isnt this called frame time variance?

Just some frame time issues, I don't know every tech site that is out there but Digital Foundry for example also shows the frame timings, this video is a prime example of what you describe https://www.youtube.com/watch?v=BAkrMx46_n4 in this case it's primarily just a lazy port but normally you should be able to avoid those problems with a good enough hard drive and a proper cpu&ram to keep your gpu fed with data.
Edit: Also make sure to have enough VRAM.

I fee Vortez did a fair coverage of the subject, but have failed to incorporate this metric themselves. I've poked two random GPU reviews on their website and it didn't have into besides FPS, which is a shame.

The last time I watched a GPU review on Newegg's channel, they included frame timings. Then again, Paul was still working there, so I don't know if they still do it.

Screw benchmarks just always buy the flagship /s

Seriously though this is something I found put about shortly after building my first PC and have valued benchmarks that include this. This type of monitoring is nothing new its just most people don't understand/care so it isn't on many benchmark reviews

My argument here is that people aren't going to understand/care about this if it's not typically covered in reviews. Most people won't even realize this is a thing. And this is not a casual techie channel either, so people will work it out.

Buying a flagship makes sense, but without measuring this metric you might not find out it's got architectural flaws and provides an inferior experience to alternative cards.

I agree with this 100% but you just hit my pet peeve and I'm not the only one.
Most channels focus on Flagship hardware and ignore the less expensive stuff. I get it, sex sells.
For many people top of the line is overkill and they can't afford it or it just isn't worth it to them.
When a reviewer with a $5000 3-way SLI 980Ti rig that they got for free says "This game runs great!"
I cringe.

Eh I find perceived stutter varies widely on a game by game basis.

I'm more concerned with how reviews use either straight max settings or settings that make a flagship GPU give 30-50 fps. This concerns me because maxing settings often doesn't represent better image quality compared to settings that give a vastly better, more playable fps, especially for midrange cards. Because reviews have done it for so long, we now have kids who only say "well I want to play x game at max settings" when they have no idea how any setting impacts image quality.

I get that reviewers do it to ensure delineation between cards, games, and image settings, but it's not representative of how regular people game. What good does knowing a fury x can play gtav at 35fps on max settings at 4k? Not a neck of a lot because who would purposely play a slideshow?

Notably, @Logan doesn't do this and just says turn the damn filters off they're not doing anything noticeable anyway. But he's literally the only one.

A majority of reviewers turn of or turn down filters and AA at higher resolutions like 4K. You don't need them there. But at lower resolutions they do make a difference.

To a point they do. I know I'm hard pressed to see a difference between 4x AA/AS and 16x or 32x. And even more hard pressed to say that difference impacts gameplay in any meaningful way.

A better comparison is the difference between SMAA and FXAA, as there are some games that are sprite-heavy and still get noticable gains from 8x or 16x AA. ARMA 3 is a good example, with trees and grass being abundant and needing of exceptional AA filtering to look remotely clear.