Write Performance Benchmark

This document will allow anyone to verify the benchmark result of writing 2 - 3 million metrics per second into DalmatinerDB. This is a single node benchmark to keep things simple and easily comparable between time series databases that don't support clustering.

We will setup 2 Haggar servers to generate metrics and fire them at a single node DalmatinerDB server as per this diagram.

You can expect near linear performance results as a DalmatinerDB cluster is horizontally scaled.

Query performance and storage compression will be handled in separate benchmarks.

Benchmark Hardware

We picked a moderate size server for testing that is relatively cheap to spin up for a few hours on GCE or AWS. At the time of writing an n1-standard-16 with a local SSD disk is $0.673 per hour.

1 x DalmatinerBD server
- GCE n1-standard-16 (16 cpu, 60GB memory, 1 x 375G local SSD disk)
2 x Haggar load generating servers
- GCE n1-highcpu-8 (8 cpu, 8GB memory, 100Gb disk)

The equivalent size DalmatinerDB hardware choice on AWS would be hi1.4xlarge which is 16 cpu, 60GB memory and 2 x 1TB local SSD disks.

The Haggar servers use less than 2GB memory and around 20% cpu with negligable disk usage. Each Haggar server will generate approximately 20Mb/s of network traffic to the DalmatinerDB server.

We benchmarked the locally attached SSD disk on the GCE server with fio.

fio --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=test \
--filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75

The GCE server local storage benchmarked at approximately 20,000 IOPS write and 70,000 IOPS read.

Single Node Optimisation

Like all benchmarkes we've done a bit of tweaking for the size of the hardware. These are two settings that you would change in a real world single node scenario. The default settings are more applicable for scaling out to 5+ nodes in a cluster over time which is what we believe most people will want to do.

If you don't change these defaults you will still see performance of 2.5 million - 3 million metrics per second. However, we expect everyone else to optimise so to set a level playing field we have also.

Ring Size

The default ring size is 64 which is great for a single node that will be scaled out to more nodes in future. We changed the ring size to 16 for this benchmark as that is more appropriate for a singe node server. Changing the ring size means wiping the data.

To change the ring size edit /etc/ddb/dalmatinerdb.conf:

ring_size=16

Cache Points

The default of 120 points in cache is good for a 5 node cluster (or beyond) but isn't optimised for a single node server. Hence we bumped up this setting. You can tweak this setting between restarts to fit the size of your ram.

In /etc/ddb/dalmatinerdb.conf:

cache_points = 600

DalmatinerDB Setup

Setup the DDB DalmatinerDB server as per this setup doc:

https://gist.github.com/sacreman/9015bf466b4fa2a654486cd79b777e64

You will need to modify the disk configuration slightly for a single locally attached SSD. The setup document assumes no additional SSD disk so that it can be played with in a VM easily.

mkdir /data
zpool create -f -o ashift=12 data /dev/sdb
zfs create data/ddb -o compression=lz4 -o atime=off -o logbias=throughput
chown dalmatiner. /data/ddb

Benchmark Software

We are using a modified version of Haggar that includes the DalmatinerDB binary protocol output. To set this up:

go get github.com/dalmatinerdb/haggar

hagger01

nohup ./haggar -agents=50 -carbon="ddb01:5555" -flush-interval=1s -jitter=1s -metrics=6000 -prefix="haggar1" &
nohup ./haggar -agents=50 -carbon="ddb01:5555" -flush-interval=1s -jitter=1s -metrics=6000 -prefix="haggar2" &
nohup ./haggar -agents=50 -carbon="ddb01:5555" -flush-interval=1s -jitter=1s -metrics=6000 -prefix="haggar3" &
nohup ./haggar -agents=50 -carbon="ddb01:5555" -flush-interval=1s -jitter=1s -metrics=6000 -prefix="haggar4" &

hagger02

nohup ./haggar -agents=50 -carbon="ddb01:5555" -flush-interval=1s -jitter=1s -metrics=6000 -prefix="haggar13" &
nohup ./haggar -agents=50 -carbon="ddb01:5555" -flush-interval=1s -jitter=1s -metrics=6000 -prefix="haggar14" &
nohup ./haggar -agents=50 -carbon="ddb01:5555" -flush-interval=1s -jitter=1s -metrics=6000 -prefix="haggar15" &
nohup ./haggar -agents=50 -carbon="ddb01:5555" -flush-interval=1s -jitter=1s -metrics=6000 -prefix="haggar16" &

This is 8 processes simulating 50 agents each sending 6000 metrics at 1 second resolution with no batching run from 2 servers over the network.

Results

You should expect to see a consistent 2 - 3 million metrics per second. You can view your results in the Dalmatiner front end at the following address:

http://server_ip:8080/?query=SELECT%20%27dalmatinerdb%40127.0.0.1%27.%27mps%27%20BUCKET%20%27dalmatinerdb%27%20LAST%2060s

The Haggar load testing tool takes about 10 minutes to build up to full speed. This benchmark has been left running for a few days and peformance has stayed level.

We ran the benchmark over 12 hours and calculated the 6 hour average, min and max throughput. The differences are caused by the 1 second jitter in the Haggar benchmark tool which is designed to more closely emulate a real world scenario of a slight fluctuation.

Max

Avg

Min

During peak load DalmatinerDB uses approximately 50% cpu on all 16 cores, approximately 50GB memory and disks spike to 30M/s read and 50M/s write.

DalmatinerDB is bottlenecked by memory on this benchmark. On tests performed with 100GB+ memory DalmatinerDB starts to bottleneck on CPU and disk at approximately 4 million metrics per second. If you have the money and the time feel free to run this benchmark on a mega box and let us know what numbers you get.

Storage

Although the purpose of this benchmark was not to test storage efficiency we did end up with a 12 hour data set. DalmatinerDB advertises 1 byte per data point after compression. In this particular test the storage is 3.5 bits (not byte!) per datapoint.

root@ddb-bench:~# zfs get all data/ddb | grep compressratio
data/ddb  compressratio         18.55x                 -
data/ddb  refcompressratio      18.55x

sacreman/ddb_benchmark.md