-
Goal: Rank solvers agnostic of PCs, JVMs, processes, OSs, runs, … (as much as possible), depending only on the Solver. ✓
-
Means: Run each single benchmark multiple (N) times (on different JVMs) and evaluate the results statistically. ✓
-
How big should N be? (calculate statistically) (Bloch: 30+) X
-
Student’s t-test ✓ (on the fly? + append another run) X
-
-
-
Statistics - results
-
Have to be modular ✓
-
f.e.: avg, min, max, median, geom. mean, std. dev ✓
-
-
Discussion with Jirka: evaluate against a function (1 hard ~ 300 soft) X
-
Library support? Implement our own? X
-
-
Report ✓
-
Examples ✓
-
Box plot ✓ JFreeChart? ✓
-
Candlestick diagram ✓ JFreeChart? ✓
-
Difference vs. box plot? ✓
-
-
Violin plot ✓ JFreeChart? X
-
Box plot vs. Violin plot? Choose one. See this article for inspiration.
-
I prefer the Violin plot, but it doesn’t have an implementation in JFreeChart.
-
-
-
Show as a layer above the current summary/other graphs (tabs in tabs) ?
-
Do we have support in JFreeChart? Do we need additional libraries? ✓
-
-
After resolution ?
-
Test reliability: Single thread vs 2 vs 4 ?
-
Validate old performance blog post benchmarks ?
-
-
Research ✓
-
Read up on performance and statistics ✓
-
-
Implementation ✓
-
Excel calculation test X
-
Don’t forget aggregation
-
Benchmarking order: random latin squares ?
-
Descriptive statistics ✓
-
Score comparator! User-override! + different branch comparators (median, avg, sum, …) X
-
Links ?
-
Last active
September 3, 2015 13:17
-
-
Save oskopek/d349428875e58cadfc39 to your computer and use it in GitHub Desktop.
Statistical benchmarking
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment