Created
October 5, 2017 19:13
-
-
Save aphyr/0ad3458a1478db97517e7ac2faf2da00 to your computer and use it in GitHub Desktop.
Advice on benchmarking databases
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Hi S-------. I'm not an expert in performance benchmarking--I focus on | |
correctness and safety--but I have a few pieces of advice here. | |
0. Pick multiple workloads that cover the gamut of behaviors in each DB. | |
*Don't* choose a completely sharded workload for VoltDB. Don't choose a | |
purely commutative workload for Cassandra. Cassandra's Paxos | |
implementation is slow and a good benchmark will demonstrate | |
that--however, it *doesn't* (I think?) require a global coordinator, | |
which means it might *scale* better than a single-coordinator system | |
like, say, VoltDB. Talk about those differences! | |
1. Have someone independent run the test. Everyone assumes vendor | |
benchmarks are bullshit--not only because the company is biased to | |
select workloads which paint them in a good light, but also because when | |
they know the tuning parameters required to adapt their own product to | |
that specific workload--and in performance-land, tuning is king. Pick a | |
neutral party with a track record of running independent tests. I... | |
honestly don't know anyone who does this, but they've gotta be out | |
there. | |
2. Get experts from each vendor to tune the test and OS for their | |
particular DB. | |
3. Common-denominator tests are helpful, but also keep in mind that the | |
safety properties and APIs of the DBs will change the shape of queries | |
dramatically. If it takes 5 queries to do something atomically in | |
Cassandra, and 1 to do it atomically in Volt, talk about those | |
differences. | |
4. Report concurrency, throughput, goodput, *and* latency distributions. | |
Keep latencies reasonable if you're talking about an online benchmark. | |
Throughput of 100khz is meaningless if it takes 10 seconds to get an | |
answer to a query a user's waiting for. | |
5. Benchmarks should take multiple days to run, and should operate on | |
realistically sized data sets. Lots of storage engines have significant | |
inflection points at medium to large data volumes. LSM trees often start | |
real fast but drop off after several days of writing. | |
6. Use real hardware. It's great to test on cloud stuff too, but real | |
hardware is gonna make it easier to tell when you've, say, written | |
enough data to force the SSDs to start reclaiming sectors. TRIM and | |
rebuild disks between benchmarks. Test both hot and cold. The usual. | |
7. Okay gosh I have a lot of opinions I should stop here but have fun | |
and good luck! | |
--Kyle |
I recommend reading http://highscalability.com/blog/2015/10/5/your-load-generator-is-probably-lying-to-you-take-the-red-pi.html as well.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Good advice. I would add that benchmarks should be reproducable if report/blog readers are willing to spend time and money on it. I'm sure it almost never happens, but having access to all config (db, OS, cloud, etc) and the actual workload and data make me trust a benchmark more.
I'm also skeptical there's a truly independent party out there... Jepsen's ethics policy is pretty unique in the consulting space. I've worked with some that are better than others though.