So I've been trying to do some benchmarking through Travis CI and I was wondering how I could make them more consistent. I am going to be doing A/B testing with travis CI, where you open a pull request and the benchmarks run first against master, then the branch that wants to be merged. I then compare the result and if there's too much of a loss in performance the build will fail.
So my issue is that even when running locally, after executing the first set you might get an inaccurate result in the second set since the first test caused your machine's CPU throttling to kick in due to excess heat. There's probably a lot of other variables that I'm not aware of...
I was thinking about using cgroups to limit CPU usage in some sort of way, hopefully ensuring that each benchmark gets the same amount of CPU cycles. Has anyone ever tried that? I'm not familiar with this sort of stuff so I'd like to hear what people do for their benchmarking. Also, should I use something other than cgroups?