aphyr · October 5, 2017 19:13 · jhugg · Oct 5, 2017 · danchia · Oct 7, 2017
diff --git a/gistfile1.txt b/gistfile1.txt
 Hi S-------. I'm not an expert in performance benchmarking--I focus on 
 correctness and safety--but I have a few pieces of advice here.

 0. Pick multiple workloads that cover the gamut of behaviors in each DB.  
 *Don't* choose a completely sharded workload for VoltDB. Don't choose a 
 purely commutative workload for Cassandra. Cassandra's Paxos 
 implementation is slow and a good benchmark will demonstrate 
 that--however, it *doesn't* (I think?) require a global coordinator, 
 which means it might *scale* better than a single-coordinator system 
 like, say, VoltDB. Talk about those differences!

 1. Have someone independent run the test. Everyone assumes vendor 
 benchmarks are bullshit--not only because the company is biased to 
 select workloads which paint them in a good light, but also because when 
 they know the tuning parameters required to adapt their own product to 
 that specific workload--and in performance-land, tuning is king. Pick a 
 neutral party with a track record of running independent tests. I...  
 honestly don't know anyone who does this, but they've gotta be out 
 there.

 2. Get experts from each vendor to tune the test and OS for their 
 particular DB.

 3. Common-denominator tests are helpful, but also keep in mind that the 
 safety properties and APIs of the DBs will change the shape of queries 
 dramatically. If it takes 5 queries to do something atomically in 
 Cassandra, and 1 to do it atomically in Volt, talk about those 
 differences.

 4. Report concurrency, throughput, goodput, *and* latency distributions.  
 Keep latencies reasonable if you're talking about an online benchmark.  
 Throughput of 100khz is meaningless if it takes 10 seconds to get an 
 answer to a query a user's waiting for.

 5. Benchmarks should take multiple days to run, and should operate on 
 realistically sized data sets. Lots of storage engines have significant 
 inflection points at medium to large data volumes. LSM trees often start 
 real fast but drop off after several days of writing.

 6. Use real hardware. It's great to test on cloud stuff too, but real 
 hardware is gonna make it easier to tell when you've, say, written 
 enough data to force the SSDs to start reclaiming sectors. TRIM and 
 rebuild disks between benchmarks. Test both hot and cold. The usual.

 7. Okay gosh I have a lot of opinions I should stop here but have fun 
 and good luck!

 --Kyle
	Hi S-------. I'm not an expert in performance benchmarking--I focus on
	correctness and safety--but I have a few pieces of advice here.

	0. Pick multiple workloads that cover the gamut of behaviors in each DB.
	Don't choose a completely sharded workload for VoltDB. Don't choose a
	purely commutative workload for Cassandra. Cassandra's Paxos
	implementation is slow and a good benchmark will demonstrate
	that--however, it doesn't (I think?) require a global coordinator,
	which means it might scale better than a single-coordinator system
	like, say, VoltDB. Talk about those differences!

	1. Have someone independent run the test. Everyone assumes vendor
	benchmarks are bullshit--not only because the company is biased to
	select workloads which paint them in a good light, but also because when
	they know the tuning parameters required to adapt their own product to
	that specific workload--and in performance-land, tuning is king. Pick a
	neutral party with a track record of running independent tests. I...
	honestly don't know anyone who does this, but they've gotta be out
	there.

	2. Get experts from each vendor to tune the test and OS for their
	particular DB.

	3. Common-denominator tests are helpful, but also keep in mind that the
	safety properties and APIs of the DBs will change the shape of queries
	dramatically. If it takes 5 queries to do something atomically in
	Cassandra, and 1 to do it atomically in Volt, talk about those
	differences.

	4. Report concurrency, throughput, goodput, and latency distributions.
	Keep latencies reasonable if you're talking about an online benchmark.
	Throughput of 100khz is meaningless if it takes 10 seconds to get an
	answer to a query a user's waiting for.

	5. Benchmarks should take multiple days to run, and should operate on
	realistically sized data sets. Lots of storage engines have significant
	inflection points at medium to large data volumes. LSM trees often start
	real fast but drop off after several days of writing.

	6. Use real hardware. It's great to test on cloud stuff too, but real
	hardware is gonna make it easier to tell when you've, say, written
	enough data to force the SSDs to start reclaiming sectors. TRIM and
	rebuild disks between benchmarks. Test both hot and cold. The usual.

	7. Okay gosh I have a lot of opinions I should stop here but have fun
	and good luck!

	--Kyle