As we review replacing Folsom with Exometer for metrics collection within Riak and Riak Core, we must verify that Exometer's performance overhead is less than or equal to Folsom. We also need to understand the impact of Exometer's measurements as load increases to ensure that its overhead remains constant. In order to measure their respective overhead, Riak will be configured in the following manner to isolate metrics collection operations from other Riak work (e.g. GETs, PUTs, DELETEs, handoffs, etc):
- Ring size: 32
- Backend: yessir
- AAE, Search, Yokozuna, and Strong Consistency disabled
In order to measure overhead, one instance of Basho Bench (b_b) executed the metrics_punisher configuration attached to this gist against a Riak cluster. A second instance of b_b will simultaneously execute the stats_query configured attached to this gist against the cluster querying the stats endpoint. Based on discussions with Russell and MvM, it is critical that these b_b instances will run on a separate, dedicated hosts against a Riak cluster running dedicated hosts separate from the b_b instances.
The initial round of testing will utilize a single node Riak cluster, and using the following Riak versions for comparison:
- 2.0.0: Riak 2.0.0
- 2.0.0-nullified: Riak 2.0.0 with the riak_kv_stats:update function modified to return ok rather than call into riak_core stats
- feuer-exometer2: Riak branch with Exometer integration
- feuer-exometer2-nullified: Riak branch with Exometer integration and the riak_kv_stats:update function modified to return ok rather than call into riak_core stats
For each of these versions, 0, 1, 2, 5, 10, and 50 stats client scenarios will be executed. The nullified instances will provide baseline values for determining the overhead of the respective stats subsystem. It is expected that the performance will be practically identical to 2.0.0 and that the Exometer overhead will be less than or equal to 2.0.0. During all test runs, message queue lengths will be monitored through etop to ensure that unbounded growth does not occur.
The shell script (perf-test.sh) that implements this process is attached to this gist.