New Community Involved Project -- Standardizing All The Things

wendell · May 16, 2017, 7:11pm

Readme Markdown from a not-yet-public project I am working on.

The Repeatable Benchmarking Platform Project

This project is designed to automate the creation of easily repeatable and reproducible benchmark environments on Windows through the use of powershell and other automation technologies.
The idea is to:
1) programmatically bring a machine fully up to date
2) configure it to minimize interference from background processes
3) install drivers appropriate for the graphics hardware
4) catalog & inventory the machine hardware/software settings
5) provide a platform for standardized repeatable software testing including
a. games
b. professional software

We can extend this platform for games testing, such as GTA V, by providing batch files or power shell scripts to automate benchmarking and dropping associated configuration profiles in game settings. In general a benchmark script should:

1) (optionally) download the game from somewhere
2) configure the game by copying a settings profile
3) run the game to benchmark it with the documented settings
4) (optionally) run a program like HWMonitor, but capture the data to a csv file
a. Ideally we have temps,
b. Fan speeds
c. Cpu frequency
d. Cpu utilization
e. Gpu utilization(?)
5) capture the results from the benchmark, store it with the machine catalog from Part 1, #4 above, along with the name of the settings file
6) (optionally) programmatically generate game-specific graphs of the data

The game_benchmarks directory contains one directory for each game – a powershell script that is game-specific that does the above steps. Any dependencies, such as game configurations, should be stored in separate sub-directories. We suggest directory names such as 3840x2160_high_fsaa_nohairworks, 0800x0600_low_noaa_nohairworks but if there is ever any confusion, the raw game settings file can be examined inside this folder for confirmation/clarification.

The desired result of this project is to improve the transparency and repeatability of software benchmarks. Further, it should be user-friendly enough that a user of hardware could download this project, run the benchmarks, and compare the results from similar systems.

For anyone out there that may be writing benchmarks and doing pull requests, please keep in mind that the users of the scripts may not be power users; your benchmarking script will have to have error-checking and helpful hints when things go wrong (“Did you install GTA V? Did you configure your steam user/pass?” etc). It may be that a gamer would want to run these scripts to confirm their system is performing appropriately.

The main script in this project can be edited in a text editor. Inside the main script you will find calls to all the benchmark scripts one might want to include to benchmark and to generate data.

The program_benchmarks directory contains one directory for each app – inside is a powershell script meant to benchmark a particular program. There may be additional powershell scripts to download dependent projects e.g. an Adobe Premiere benchmark may require the download and unpacking of a Premiere Project and sample footage.

USAGE:

UpdateEverything.ps

This script updates your machine with the latest windows updates. It was designed for us on Windows 10. It may require multiple reboots to complete. This script does not run automatically as part of the benchmark system.

Inventory.ps

This script inventories the system. It will automatically run as part of the benchmark process, unless it is commented out from the RunBenchmark script.

RunBenchmarks.ps

This script runs the benchmark scripts that are not commented out. Note that for steam games, you may need to specify your steam username and password so that a game may be downloaded. You may also have to specify what drive letter and path a particular game is installed on.

This section TBD.

This project does not exist yet it does kinda in that I have bits and pieces of this. For example I can run GTA V benchmarks in a canned fashion. I suspect some third party programs like AutoIT would be needed to automated "every" game testing but for getting started toward a 0.01 release on github of all this, that may not be necessary.

The goal here is open source level the playing field to do everything.

We could even include software like Asus' realbench as long as the powershell script for that benchmark downloads + configures it in a mostly automatic kind of way -- why not?

All of this on github, accepting pull requests, etc.

I thought I would post this here, now, to get feedback and to see if there are similar projects out there that I have missed. Perhaps there are open and permissive scripts that already exist for things like updates, configuring steam (for example on gta v I have to set the command line manually, via steam client, still, for benchmarking.. which is annoying). Perhaps the community would get involved and develop some of the game benchmarking scripts, scour forums for them, and we can organize them into a central repository.

Dje4321 · May 16, 2017, 8:11pm

Good idea but im concerned about software updates and how that might change the results.
A nvidia driver update might add 20% fps to a game, ruining the repeatably of the results.

wendell · May 16, 2017, 8:13pm

that problem is solved indirectly by the machine inventory being collected -- when one compares results one can see versions.

There is simply no fixing driver/os updates. This just gives you the data to rule it out -- much like game settings may vary; there may be honest mistakes; it is a tedious process.

If fully automated, a lot of problems like software updates and game updates either go away completely, or are easy to suss out for anyone who cares to investigate.

Plus this lets us gather a LOT more data -- casual enthusiasts can bench their systems and we have inventories. Eventually we may have to sign said inventories or automate tools like cpu-z to post the inventories to the website and save those to a file.

tldr; doesn't matter if there are differences, what matters is documentation of the state of the system.

c0d1st · May 16, 2017, 9:16pm

This sounds a bit like what Barnacules did at Microsoft....I wonder why he doesn't use his software test harness and automation knowledge for PC testing and benchmarking?

I LOVE this idea. I could give data novices like myself a new hobby (dataset) to play with.

Is there consideration for statistically valid sampling for a given setup with unknown population? How many times should a test be run?

Also, any consideration for utilization analysis? This would show the break points in hardware. What an upgrade would actually net in terms of target (Y). Hardware optimization based on use case. Given enough data, it could be like the next level of PC Part Picker. Pick your parts and use case(s) to see what performance you can expect based on collected results. Some statistical rigor should be use to highlight anomalous submissions.

I like the idea of taking a "snap-shot" of what is. Methods exists to figure out if elements a play role in your hypothesis. A pool to create data experiments. Awesome!

OPT-IN or OPT-OUT pooling of results for research purposes? I might have missed this, but I didn't see this specifically called out.

Eden · May 16, 2017, 10:14pm

If it was a goal of the project i think you might be able to replicate exact builds with the right windows version. Though I don't know how easy this is to do with Windows. Linux it wouldn't be overly difficult (but maybe that's another project).

Sounds like a good idea this. though ive nothing to add at the moment.

Freaksmacker · May 16, 2017, 10:51pm

That is no small task. One of the most comprehensive tools I have seen for step 4 resides in the Tera client launcher that i have seen. The Enmasse Diagnostic tool

wendell · May 16, 2017, 11:06pm

Admittedly it varies from game to game. I got it for GTA v already and a few other games. But some games are tough.