Papers

Benchmarking seems not to be a main focus of any specific academic field, although the problem has been addressed by many different groups in CS.

Some papers I found interesting:

By far the most basic (and in my mind the most interesting) of these papers is:

http://sape.inf.usi.ch/sites/default/files/publication/EvaluateCollaboratoryTR1.pdf

Libraries

Lots of people (including me) have written libraries for benchmarking functions. By far the most interesting I've seen is Bryan O'Sullivan's criterion library:

http://www.serpentine.com/criterion/

Other libraries I came across include:

Codespeed: https://github.com/tobami/codespeed/wiki/Overview
VBench: http://wesmckinney.com/blog/?p=373
Benchmark.js: http://benchmarkjs.com/docs
Trend Prof: http://trend-prof.tigris.org
Criterion.rs: https://github.com/japaric/criterion.rs
Rust official benchmarks: http://web.mit.edu/rust-lang_v0.9/doc/guide-testing.html
Go official benchmarks: http://golang.org/pkg/testing/
Airspeed Velocity: http://spacetelescope.github.io/asv/using.html

Broad Principles

Does the benchmarking tool account for uncertainty?
Does the benchmarking tool extrapolate across inputs?

johnmyleswhite/gist:14dbd928019669faef82

Papers

Libraries

Broad Principles