hyc · July 20, 2017 11:19 · jvshahid · Jun 23, 2014 · hyc · Jun 23, 2014
diff --git a/00before b/00before
 I believe this is directly comparable to the results published here
 http://influxdb.com/blog/2014/06/20/leveldb_vs_rocksdb_vs_hyperleveldb_vs_lmdb_performance.html

 My laptop has 8GB of RAM but I pared it down to 4GB by turning off swap and creating a large enough file
 in tmpfs to drop free RAM down to 4GB.

 This is the code prior to using Sorted Duplicates.

 The RocksDB performance is amazingly poor.

 violino:/home/software/influxdb> /usr/bin/time -v ./benchmark-storage -path=/home/test/db -points=100000000 -series=500000
 ################ Benchmarking: lmdb
 Writing 100000000 points in batches of 1000 points took 10m0.973944912s (6.009739 microsecond per point)
 Querying 100000000 points took 4m0.062210717s (2.400622 microseconds per point)
 Size: 6.1G
 Took 1m19.017894891s to delete 50000000 points
 Took 2.095us to compact
 Querying 50000000 points took 1m29.304574146s (1.786091 microseconds per point)
 Size: 7.6G
 Writing 50000000 points in batches of 1000 points took 9m36.931839789s (11.538637 microsecond per point)
 Size: 7.7G
 ################ Benchmarking: leveldb
 Writing 100000000 points in batches of 1000 points took 39m50.903262204s (23.909033 microsecond per point)
 Querying 100000000 points took 2m49.339779425s (1.693398 microseconds per point)
 Size: 2.7G
 Took 5m48.831738377s to delete 50000000 points
 Took 6m17.357548286s to compact
 Querying 50000000 points took 1m0.168453865s (1.203369 microseconds per point)
 Size: 1.4G
 Writing 50000000 points in batches of 1000 points took 16m14.040395323s (19.480808 microsecond per point)
 Size: 2.6G
 ################ Benchmarking: rocksdb
 Writing 100000000 points in batches of 1000 points took 3h25m10.762258086s (123.107623 microsecond per point)
 Querying 100000000 points took 2m26.217626808s (1.462176 microseconds per point)
 Size: 37G
 Took 8m45.677135051s to delete 50000000 points
 Took 2m55.372818028s to compact
 Querying 50000000 points took 1m1.570714964s (1.231414 microseconds per point)
 Size: 37G
 Writing 50000000 points in batches of 1000 points took 2h1m51.42641092s (146.228528 microsecond per point)
 Size: 58G
 ################ Benchmarking: hyperleveldb
 Writing 100000000 points in batches of 1000 points took 9m9.924859094s (5.499249 microsecond per point)
 Querying 100000000 points took 9m32.667573668s (5.726676 microseconds per point)
 Size: 3.3G
 Took 5m47.830141963s to delete 50000000 points
 Took 6m39.712762331s to compact
 Querying 50000000 points took 1m22.704782776s (1.654096 microseconds per point)
 Size: 1.6G
 Writing 50000000 points in batches of 1000 points took 4m24.807726459s (5.296155 microsecond per point)
 Size: 3.5G
 	Command being timed: "./benchmark-storage -path=/home/test/db -points=100000000 -series=500000"
 	User time (seconds): 22667.93
 	System time (seconds): 6365.04
 	Percent of CPU this job got: 98%
 	Elapsed (wall clock) time (h:mm:ss or m:ss): 8:12:30
 	Average shared text size (kbytes): 0
 	Average unshared data size (kbytes): 0
 	Average stack size (kbytes): 0
 	Average total size (kbytes): 0
 	Maximum resident set size (kbytes): 27072656
 	Average resident set size (kbytes): 0
 	Major (requiring I/O) page faults: 1177869
 	Minor (reclaiming a frame) page faults: 90486770
 	Voluntary context switches: 669563529
 	Involuntary context switches: 14246002
 	Swaps: 0
 	File system inputs: 63595816
 	File system outputs: 590122424
 	Socket messages sent: 0
 	Socket messages received: 0
 	Signals delivered: 0
 	Page size (bytes): 4096
 	Exit status: 0
diff --git a/01failure b/01failure
 This was another run on the same machine, same code. Strangely, RocksDB's result is completely different this time, crashing the program.

 violino:/home/software/influxdb> /usr/bin/time -v ./benchmark-storage -path=/home/test/db -points=100000000 -series=500000
 ################ Benchmarking: lmdb
 Writing 100000000 points in batches of 1000 points took 12m14.22774633s (7.342277 microsecond per point)
 Querying 100000000 points took 6m1.96092097s (3.619609 microseconds per point)
 Size: 7.6G
 Took 6m23.96630963s to delete 50000000 points
 Took 1.048us to compact
 Querying 50000000 points took 4m0.083501501s (4.801670 microseconds per point)
 Size: 7.6G
 Writing 50000000 points in batches of 1000 points took 1h28m28.45283235s (106.169057 microsecond per point)
 Size: 8.2G
 ################ Benchmarking: leveldb
 Writing 100000000 points in batches of 1000 points took 39m32.39700747s (23.723970 microsecond per point)
 Querying 100000000 points took 3m6.89910029s (1.868991 microseconds per point)
 Size: 2.7G
 Took 5m39.404872895s to delete 50000000 points
 Took 6m14.918991943s to compact
 Querying 50000000 points took 1m0.488077474s (1.209762 microseconds per point)
 Size: 1.4G
 Writing 50000000 points in batches of 1000 points took 16m28.047675968s (19.760954 microsecond per point)
 Size: 2.6G
 ################ Benchmarking: rocksdb
 Writing 100000000 points in batches of 1000 points took 3h45m57.166233904s (135.571662 microsecond per point)
 Querying 100000000 points took 3m3.470915689s (1.834709 microseconds per point)
 Size: 41G
 Took 8m33.237626533s to delete 50000000 points
 Took 3m47.826396787s to compact
 Querying 50000000 points took 51.101206202s (1.022024 microseconds per point)
 Size: 41G
 Writing 50000000 points in batches of 1000 points took 2h55m7.684545292s (210.153691 microsecond per point)
 panic: exit status 1

 goroutine 1 [running]:
 runtime.panic(0x7cfc80, 0xc2100d3778)
 	/usr/local/go/src/pkg/runtime/panic.c:266 +0xb6
 main.getSize(0xc210117b80, 0x1a, 0x1a, 0x4)
 	/home/software/influxdb/src/tools/benchmark-storage/main.go:54 +0x130
 main.benchmarkDbCommon(0x7f9114265198, 0xc21001fc00, 0x5f5e100, 0x3e8, 0x7a120, ...)
 	/home/software/influxdb/src/tools/benchmark-storage/main.go:97 +0x811
 main.benchmark(0x7eb180, 0x7, 0x5f5e100, 0x3e8, 0x7a120, ...)
 	/home/software/influxdb/src/tools/benchmark-storage/main.go:47 +0x269
 main.main()
 	/home/software/influxdb/src/tools/benchmark-storage/main.go:32 +0x454

 goroutine 3 [chan receive]:
 code.google.com/p/log4go.ConsoleLogWriter.run(0xc2100492c0, 0x7f9114255fe8, 0xc210000008)
 	/home/software/influxdb/src/code.google.com/p/log4go/termlog.go:27 +0x60
 created by code.google.com/p/log4go.NewConsoleLogWriter
 	/home/software/influxdb/src/code.google.com/p/log4go/termlog.go:19 +0x67

 goroutine 4 [syscall]:
 runtime.goexit()
 	/usr/local/go/src/pkg/runtime/proc.c:1394

 goroutine 6 [finalizer wait]:
 runtime.park(0x5d8210, 0xc68380, 0xc571a8)
 	/usr/local/go/src/pkg/runtime/proc.c:1342 +0x66
 runfinq()
 	/usr/local/go/src/pkg/runtime/mgc0.c:2279 +0x84
 runtime.goexit()
 	/usr/local/go/src/pkg/runtime/proc.c:1394
 Command exited with non-zero status 2
 	Command being timed: "./benchmark-storage -path=/home/test/db -points=100000000 -series=500000"
 	User time (seconds): 21330.49
 	System time (seconds): 8025.84
 	Percent of CPU this job got: 78%
 	Elapsed (wall clock) time (h:mm:ss or m:ss): 10:21:55
 	Average shared text size (kbytes): 0
 	Average unshared data size (kbytes): 0
 	Average stack size (kbytes): 0
 	Average total size (kbytes): 0
 	Maximum resident set size (kbytes): 17911936
 	Average resident set size (kbytes): 0
 	Major (requiring I/O) page faults: 22018495
 	Minor (reclaiming a frame) page faults: 52586768
 	Voluntary context switches: 884225141
 	Involuntary context switches: 12885920
 	Swaps: 0
 	File system inputs: 401123272
 	File system outputs: 747221368
 	Socket messages sent: 0
 	Socket messages received: 0
 	Signals delivered: 0
 	Page size (bytes): 4096
 	Exit status: 2
diff --git a/02after b/02after
 This is the LMDB result using the Sorted Duplicates patches
 https://github.com/influxdb/influxdb/pull/678

 I don't have the RocksDB result yet, it will be several more hours before that finishes.
 ... updated, finally finished


 violino:/home/software/influxdb> ./benchmark-storage -path=/home/test/db -points=100000000 -series=500000
 ################ Benchmarking: lmdb
 Writing 100000000 points in batches of 1000 points took 9m55.060557828s (5.950606 microsecond per point)
 Querying 100000000 points took 1m26.153283998s (0.861533 microseconds per point)
 Size: 4.0G
 Took 1m11.705748913s to delete 50000000 points
 Took 1.257us to compact
 Querying 50000000 points took 43.994534804s (0.879891 microseconds per point)
 Size: 4.0G
 Writing 50000000 points in batches of 1000 points took 5m32.398039417s (6.647961 microsecond per point)
 Size: 5.9G
 ################ Benchmarking: leveldb
 Writing 100000000 points in batches of 1000 points took 40m8.701727125s (24.087017 microsecond per point)
 Querying 100000000 points took 3m39.413232183s (2.194132 microseconds per point)
 Size: 2.7G
 Took 17m48.421502672s to delete 50000000 points
 Took 6m13.689504673s to compact
 Querying 50000000 points took 1m1.125226854s (1.222505 microseconds per point)
 Size: 1.4G
 Writing 50000000 points in batches of 1000 points took 16m21.570047473s (19.631401 microsecond per point)
 Size: 2.6G
 ################ Benchmarking: rocksdb
 Writing 100000000 points in batches of 1000 points took 3h10m25.346725469s (114.253467 microsecond per point)
 Querying 100000000 points took 2m26.002405473s (1.460024 microseconds per point)
 Size: 35G
 Took 16m40.54319908s to delete 50000000 points
 Took 3m3.3481798s to compact
 Querying 50000000 points took 58.448312524s (1.168966 microseconds per point)
 Size: 36G
 Writing 50000000 points in batches of 1000 points took 2h11m27.871520367s (157.757430 microsecond per point)
 Size: 59G
 ################ Benchmarking: hyperleveldb
 Writing 100000000 points in batches of 1000 points took 9m10.276314813s (5.502763 microsecond per point)
 Querying 100000000 points took 12m8.949611018s (7.289496 microseconds per point)
 Size: 3.3G
 Took 5m11.934801159s to delete 50000000 points
 Took 10m31.038632478s to compact
 Querying 50000000 points took 1m24.106956728s (1.682139 microseconds per point)
 Size: 1.6G
 Writing 50000000 points in batches of 1000 points took 4m30.184909667s (5.403698 microsecond per point)
 Size: 3.4G


 LMDB doesn't have the plethora of complex tuning APIs that other databases do, but it *does* have some worthwhile
 data access features that other databases don't. Learning to use them correctly is well worth the trouble.
diff --git a/03disk b/03disk
 violino:~/OD/mdb/libraries/liblmdb> ls -l /home/test/db/
 total 119
 drwxr-xr-x 2 hyc hyc 18904 Jun 22 22:31 test-hyperleveldb
 drwxr-xr-x 2 hyc hyc 43128 Jun 22 15:56 test-leveldb
 drwxr-xr-x 2 hyc hyc    96 Jun 22 14:01 test-lmdb
 drwxr-xr-x 2 hyc hyc 60112 Jun 22 21:43 test-rocksdb
 violino:~/OD/mdb/libraries/liblmdb> du !$
 du /home/test/db/
 3568158	/home/test/db/test-hyperleveldb
 6084152	/home/test/db/test-lmdb
 61190722	/home/test/db/test-rocksdb
 2689903	/home/test/db/test-leveldb
 73532934	/home/test/db/

 The data files were not touched after running the test. You can see that LevelDB didn't finish until almost 2 hours after the LMDB test ended, the RocksDB test ended almost 6 hours after LevelDB ended, and HyperLevelDB took about 45 minutes after that.
diff --git a/04compact b/04compact
 Added a new -c (compact) option to mdb_copy, which copies the DB sequentially, omitting freed/deleted pages.

 Starting with the usual run, interrupt it after half the records are deleted:

 violino:/home/software/influxdb> /usr/bin/time -v ./benchmark-storage -path=/home/test/db -points=100000000 -series=500000
 ################ Benchmarking: lmdb
 Writing 100000000 points in batches of 1000 points took 10m18.8538945s (6.188539 microsecond per point)
 Querying 100000000 points took 1m28.581634191s (0.885816 microseconds per point)
 Size: 4.0G
 Took 1m13.593047399s to delete 50000000 points
 Took 1.118us to compact
 ^CCommand exited with non-zero status 2
 	Command being timed: "./benchmark-storage -path=/home/test/db -points=100000000 -series=500000"
 	User time (seconds): 845.65
 	System time (seconds): 74.44
 	Percent of CPU this job got: 104%
 	Elapsed (wall clock) time (h:mm:ss or m:ss): 14:42.83
 	Average shared text size (kbytes): 0
 	Average unshared data size (kbytes): 0
 	Average stack size (kbytes): 0
 	Average total size (kbytes): 0
 	Maximum resident set size (kbytes): 16497568
 	Average resident set size (kbytes): 0
 	Major (requiring I/O) page faults: 51
 	Minor (reclaiming a frame) page faults: 7521009
 	Voluntary context switches: 6036856
 	Involuntary context switches: 87408
 	Swaps: 0
 	File system inputs: 12152
 	File system outputs: 60187944
 	Socket messages sent: 0
 	Socket messages received: 0
 	Signals delivered: 0
 	Page size (bytes): 4096
 	Exit status: 2

 Then check how compaction behaves:

 violino:~/OD/mdb/libraries/liblmdb> /usr/bin/time -v ./mdb_copy -c /home/test/db/test-lmdb/ /home/test/db/x
 	Command being timed: "./mdb_copy -c /home/test/db/test-lmdb/ /home/test/db/x"
 	User time (seconds): 1.56
 	System time (seconds): 6.23
 	Percent of CPU this job got: 10%
 	Elapsed (wall clock) time (h:mm:ss or m:ss): 1:15.60
 	Average shared text size (kbytes): 0
 	Average unshared data size (kbytes): 0
 	Average stack size (kbytes): 0
 	Average total size (kbytes): 0
 	Maximum resident set size (kbytes): 16255568
 	Average resident set size (kbytes): 0
 	Major (requiring I/O) page faults: 1
 	Minor (reclaiming a frame) page faults: 1016071
 	Voluntary context switches: 12714
 	Involuntary context switches: 1924
 	Swaps: 0
 	File system inputs: 600
 	File system outputs: 8141848
 	Socket messages sent: 0
 	Socket messages received: 0
 	Signals delivered: 0
 	Page size (bytes): 4096
 	Exit status: 0
 violino:~/OD/mdb/libraries/liblmdb> du /home/test/db
 4065106	/home/test/db/x
 4128788	/home/test/db/test-lmdb
 8193894	/home/test/db

 Doesn't make a huge difference in space, since only ~14,000 freed pages were in the DB to begin with:

 violino:~/OD/mdb/libraries/liblmdb> ./mdb_stat -ef /home/test/db/test-lmdb/
 Environment Info
  Map address: (nil)
  Map size: 10737418240
  Page size: 4096
  Max pages: 2621440
  Number of pages used: 1029636
  Last transaction ID: 600001
  Max readers: 126
  Number of readers used: 0
 Freelist Status
  Tree depth: 2
  Branch pages: 1
  Leaf pages: 40
  Overflow pages: 0
  Entries: 1395
  Free pages: 14310
 Status of Main DB
  Tree depth: 1
  Branch pages: 0
  Leaf pages: 1
  Overflow pages: 0
  Entries: 50000000
 violino:~/OD/mdb/libraries/liblmdb> ./mdb_stat -ef /home/test/db/x
 Environment Info
  Map address: (nil)
  Map size: 10737418240
  Page size: 4096
  Max pages: 2621440
  Number of pages used: 1015285
  Last transaction ID: 1
  Max readers: 126
  Number of readers used: 0
 Freelist Status
  Tree depth: 0
  Branch pages: 0
  Leaf pages: 0
  Overflow pages: 0
  Entries: 0
  Free pages: 0
 Status of Main DB
  Tree depth: 1
  Branch pages: 0
  Leaf pages: 1
  Overflow pages: 0
  Entries: 50000000

 The test is a bit awkward here too, since it deletes the entries from the middle of the DB. If you were truly expiring records from a time-series database, you would delete from the head of the DB. Deleting in the middle like this leaves a lot of pages half full, instead of totally emptying/freeing pages.
	I believe this is directly comparable to the results published here
	http://influxdb.com/blog/2014/06/20/leveldb_vs_rocksdb_vs_hyperleveldb_vs_lmdb_performance.html

	My laptop has 8GB of RAM but I pared it down to 4GB by turning off swap and creating a large enough file
	in tmpfs to drop free RAM down to 4GB.

	This is the code prior to using Sorted Duplicates.

	The RocksDB performance is amazingly poor.

	violino:/home/software/influxdb> /usr/bin/time -v ./benchmark-storage -path=/home/test/db -points=100000000 -series=500000
	################ Benchmarking: lmdb
	Writing 100000000 points in batches of 1000 points took 10m0.973944912s (6.009739 microsecond per point)
	Querying 100000000 points took 4m0.062210717s (2.400622 microseconds per point)
	Size: 6.1G
	Took 1m19.017894891s to delete 50000000 points
	Took 2.095us to compact
	Querying 50000000 points took 1m29.304574146s (1.786091 microseconds per point)
	Size: 7.6G
	Writing 50000000 points in batches of 1000 points took 9m36.931839789s (11.538637 microsecond per point)
	Size: 7.7G
	################ Benchmarking: leveldb
	Writing 100000000 points in batches of 1000 points took 39m50.903262204s (23.909033 microsecond per point)
	Querying 100000000 points took 2m49.339779425s (1.693398 microseconds per point)
	Size: 2.7G
	Took 5m48.831738377s to delete 50000000 points
	Took 6m17.357548286s to compact
	Querying 50000000 points took 1m0.168453865s (1.203369 microseconds per point)
	Size: 1.4G
	Writing 50000000 points in batches of 1000 points took 16m14.040395323s (19.480808 microsecond per point)
	Size: 2.6G
	################ Benchmarking: rocksdb
	Writing 100000000 points in batches of 1000 points took 3h25m10.762258086s (123.107623 microsecond per point)
	Querying 100000000 points took 2m26.217626808s (1.462176 microseconds per point)
	Size: 37G
	Took 8m45.677135051s to delete 50000000 points
	Took 2m55.372818028s to compact
	Querying 50000000 points took 1m1.570714964s (1.231414 microseconds per point)
	Size: 37G
	Writing 50000000 points in batches of 1000 points took 2h1m51.42641092s (146.228528 microsecond per point)
	Size: 58G
	################ Benchmarking: hyperleveldb
	Writing 100000000 points in batches of 1000 points took 9m9.924859094s (5.499249 microsecond per point)
	Querying 100000000 points took 9m32.667573668s (5.726676 microseconds per point)
	Size: 3.3G
	Took 5m47.830141963s to delete 50000000 points
	Took 6m39.712762331s to compact
	Querying 50000000 points took 1m22.704782776s (1.654096 microseconds per point)
	Size: 1.6G
	Writing 50000000 points in batches of 1000 points took 4m24.807726459s (5.296155 microsecond per point)
	Size: 3.5G
	Command being timed: "./benchmark-storage -path=/home/test/db -points=100000000 -series=500000"
	User time (seconds): 22667.93
	System time (seconds): 6365.04
	Percent of CPU this job got: 98%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 8:12:30
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 27072656
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 1177869
	Minor (reclaiming a frame) page faults: 90486770
	Voluntary context switches: 669563529
	Involuntary context switches: 14246002
	Swaps: 0
	File system inputs: 63595816
	File system outputs: 590122424
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0
	This was another run on the same machine, same code. Strangely, RocksDB's result is completely different this time, crashing the program.

	violino:/home/software/influxdb> /usr/bin/time -v ./benchmark-storage -path=/home/test/db -points=100000000 -series=500000
	################ Benchmarking: lmdb
	Writing 100000000 points in batches of 1000 points took 12m14.22774633s (7.342277 microsecond per point)
	Querying 100000000 points took 6m1.96092097s (3.619609 microseconds per point)
	Size: 7.6G
	Took 6m23.96630963s to delete 50000000 points
	Took 1.048us to compact
	Querying 50000000 points took 4m0.083501501s (4.801670 microseconds per point)
	Size: 7.6G
	Writing 50000000 points in batches of 1000 points took 1h28m28.45283235s (106.169057 microsecond per point)
	Size: 8.2G
	################ Benchmarking: leveldb
	Writing 100000000 points in batches of 1000 points took 39m32.39700747s (23.723970 microsecond per point)
	Querying 100000000 points took 3m6.89910029s (1.868991 microseconds per point)
	Size: 2.7G
	Took 5m39.404872895s to delete 50000000 points
	Took 6m14.918991943s to compact
	Querying 50000000 points took 1m0.488077474s (1.209762 microseconds per point)
	Size: 1.4G
	Writing 50000000 points in batches of 1000 points took 16m28.047675968s (19.760954 microsecond per point)
	Size: 2.6G
	################ Benchmarking: rocksdb
	Writing 100000000 points in batches of 1000 points took 3h45m57.166233904s (135.571662 microsecond per point)
	Querying 100000000 points took 3m3.470915689s (1.834709 microseconds per point)
	Size: 41G
	Took 8m33.237626533s to delete 50000000 points
	Took 3m47.826396787s to compact
	Querying 50000000 points took 51.101206202s (1.022024 microseconds per point)
	Size: 41G
	Writing 50000000 points in batches of 1000 points took 2h55m7.684545292s (210.153691 microsecond per point)
	panic: exit status 1

	goroutine 1 [running]:
	runtime.panic(0x7cfc80, 0xc2100d3778)
	/usr/local/go/src/pkg/runtime/panic.c:266 +0xb6
	main.getSize(0xc210117b80, 0x1a, 0x1a, 0x4)
	/home/software/influxdb/src/tools/benchmark-storage/main.go:54 +0x130
	main.benchmarkDbCommon(0x7f9114265198, 0xc21001fc00, 0x5f5e100, 0x3e8, 0x7a120, ...)
	/home/software/influxdb/src/tools/benchmark-storage/main.go:97 +0x811
	main.benchmark(0x7eb180, 0x7, 0x5f5e100, 0x3e8, 0x7a120, ...)
	/home/software/influxdb/src/tools/benchmark-storage/main.go:47 +0x269
	main.main()
	/home/software/influxdb/src/tools/benchmark-storage/main.go:32 +0x454

	goroutine 3 [chan receive]:
	code.google.com/p/log4go.ConsoleLogWriter.run(0xc2100492c0, 0x7f9114255fe8, 0xc210000008)
	/home/software/influxdb/src/code.google.com/p/log4go/termlog.go:27 +0x60
	created by code.google.com/p/log4go.NewConsoleLogWriter
	/home/software/influxdb/src/code.google.com/p/log4go/termlog.go:19 +0x67

	goroutine 4 [syscall]:
	runtime.goexit()
	/usr/local/go/src/pkg/runtime/proc.c:1394

	goroutine 6 [finalizer wait]:
	runtime.park(0x5d8210, 0xc68380, 0xc571a8)
	/usr/local/go/src/pkg/runtime/proc.c:1342 +0x66
	runfinq()
	/usr/local/go/src/pkg/runtime/mgc0.c:2279 +0x84
	runtime.goexit()
	/usr/local/go/src/pkg/runtime/proc.c:1394
	Command exited with non-zero status 2
	Command being timed: "./benchmark-storage -path=/home/test/db -points=100000000 -series=500000"
	User time (seconds): 21330.49
	System time (seconds): 8025.84
	Percent of CPU this job got: 78%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 10:21:55
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 17911936
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 22018495
	Minor (reclaiming a frame) page faults: 52586768
	Voluntary context switches: 884225141
	Involuntary context switches: 12885920
	Swaps: 0
	File system inputs: 401123272
	File system outputs: 747221368
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 2
	This is the LMDB result using the Sorted Duplicates patches
	https://github.com/influxdb/influxdb/pull/678

	I don't have the RocksDB result yet, it will be several more hours before that finishes.
	... updated, finally finished


	violino:/home/software/influxdb> ./benchmark-storage -path=/home/test/db -points=100000000 -series=500000
	################ Benchmarking: lmdb
	Writing 100000000 points in batches of 1000 points took 9m55.060557828s (5.950606 microsecond per point)
	Querying 100000000 points took 1m26.153283998s (0.861533 microseconds per point)
	Size: 4.0G
	Took 1m11.705748913s to delete 50000000 points
	Took 1.257us to compact
	Querying 50000000 points took 43.994534804s (0.879891 microseconds per point)
	Size: 4.0G
	Writing 50000000 points in batches of 1000 points took 5m32.398039417s (6.647961 microsecond per point)
	Size: 5.9G
	################ Benchmarking: leveldb
	Writing 100000000 points in batches of 1000 points took 40m8.701727125s (24.087017 microsecond per point)
	Querying 100000000 points took 3m39.413232183s (2.194132 microseconds per point)
	Size: 2.7G
	Took 17m48.421502672s to delete 50000000 points
	Took 6m13.689504673s to compact
	Querying 50000000 points took 1m1.125226854s (1.222505 microseconds per point)
	Size: 1.4G
	Writing 50000000 points in batches of 1000 points took 16m21.570047473s (19.631401 microsecond per point)
	Size: 2.6G
	################ Benchmarking: rocksdb
	Writing 100000000 points in batches of 1000 points took 3h10m25.346725469s (114.253467 microsecond per point)
	Querying 100000000 points took 2m26.002405473s (1.460024 microseconds per point)
	Size: 35G
	Took 16m40.54319908s to delete 50000000 points
	Took 3m3.3481798s to compact
	Querying 50000000 points took 58.448312524s (1.168966 microseconds per point)
	Size: 36G
	Writing 50000000 points in batches of 1000 points took 2h11m27.871520367s (157.757430 microsecond per point)
	Size: 59G
	################ Benchmarking: hyperleveldb
	Writing 100000000 points in batches of 1000 points took 9m10.276314813s (5.502763 microsecond per point)
	Querying 100000000 points took 12m8.949611018s (7.289496 microseconds per point)
	Size: 3.3G
	Took 5m11.934801159s to delete 50000000 points
	Took 10m31.038632478s to compact
	Querying 50000000 points took 1m24.106956728s (1.682139 microseconds per point)
	Size: 1.6G
	Writing 50000000 points in batches of 1000 points took 4m30.184909667s (5.403698 microsecond per point)
	Size: 3.4G


	LMDB doesn't have the plethora of complex tuning APIs that other databases do, but it does have some worthwhile
	data access features that other databases don't. Learning to use them correctly is well worth the trouble.
	violino:~/OD/mdb/libraries/liblmdb> ls -l /home/test/db/
	total 119
	drwxr-xr-x 2 hyc hyc 18904 Jun 22 22:31 test-hyperleveldb
	drwxr-xr-x 2 hyc hyc 43128 Jun 22 15:56 test-leveldb
	drwxr-xr-x 2 hyc hyc 96 Jun 22 14:01 test-lmdb
	drwxr-xr-x 2 hyc hyc 60112 Jun 22 21:43 test-rocksdb
	violino:~/OD/mdb/libraries/liblmdb> du !$
	du /home/test/db/
	3568158 /home/test/db/test-hyperleveldb
	6084152 /home/test/db/test-lmdb
	61190722 /home/test/db/test-rocksdb
	2689903 /home/test/db/test-leveldb
	73532934 /home/test/db/

	The data files were not touched after running the test. You can see that LevelDB didn't finish until almost 2 hours after the LMDB test ended, the RocksDB test ended almost 6 hours after LevelDB ended, and HyperLevelDB took about 45 minutes after that.
	Added a new -c (compact) option to mdb_copy, which copies the DB sequentially, omitting freed/deleted pages.

	Starting with the usual run, interrupt it after half the records are deleted:

	violino:/home/software/influxdb> /usr/bin/time -v ./benchmark-storage -path=/home/test/db -points=100000000 -series=500000
	################ Benchmarking: lmdb
	Writing 100000000 points in batches of 1000 points took 10m18.8538945s (6.188539 microsecond per point)
	Querying 100000000 points took 1m28.581634191s (0.885816 microseconds per point)
	Size: 4.0G
	Took 1m13.593047399s to delete 50000000 points
	Took 1.118us to compact
	^CCommand exited with non-zero status 2
	Command being timed: "./benchmark-storage -path=/home/test/db -points=100000000 -series=500000"
	User time (seconds): 845.65
	System time (seconds): 74.44
	Percent of CPU this job got: 104%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 14:42.83
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 16497568
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 51
	Minor (reclaiming a frame) page faults: 7521009
	Voluntary context switches: 6036856
	Involuntary context switches: 87408
	Swaps: 0
	File system inputs: 12152
	File system outputs: 60187944
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 2

	Then check how compaction behaves:

	violino:~/OD/mdb/libraries/liblmdb> /usr/bin/time -v ./mdb_copy -c /home/test/db/test-lmdb/ /home/test/db/x
	Command being timed: "./mdb_copy -c /home/test/db/test-lmdb/ /home/test/db/x"
	User time (seconds): 1.56
	System time (seconds): 6.23
	Percent of CPU this job got: 10%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 1:15.60
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 16255568
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 1
	Minor (reclaiming a frame) page faults: 1016071
	Voluntary context switches: 12714
	Involuntary context switches: 1924
	Swaps: 0
	File system inputs: 600
	File system outputs: 8141848
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0
	violino:~/OD/mdb/libraries/liblmdb> du /home/test/db
	4065106 /home/test/db/x
	4128788 /home/test/db/test-lmdb
	8193894 /home/test/db

	Doesn't make a huge difference in space, since only ~14,000 freed pages were in the DB to begin with:

	violino:~/OD/mdb/libraries/liblmdb> ./mdb_stat -ef /home/test/db/test-lmdb/
	Environment Info
	Map address: (nil)
	Map size: 10737418240
	Page size: 4096
	Max pages: 2621440
	Number of pages used: 1029636
	Last transaction ID: 600001
	Max readers: 126
	Number of readers used: 0
	Freelist Status
	Tree depth: 2
	Branch pages: 1
	Leaf pages: 40
	Overflow pages: 0
	Entries: 1395
	Free pages: 14310
	Status of Main DB
	Tree depth: 1
	Branch pages: 0
	Leaf pages: 1
	Overflow pages: 0
	Entries: 50000000
	violino:~/OD/mdb/libraries/liblmdb> ./mdb_stat -ef /home/test/db/x
	Environment Info
	Map address: (nil)
	Map size: 10737418240
	Page size: 4096
	Max pages: 2621440
	Number of pages used: 1015285
	Last transaction ID: 1
	Max readers: 126
	Number of readers used: 0
	Freelist Status
	Tree depth: 0
	Branch pages: 0
	Leaf pages: 0
	Overflow pages: 0
	Entries: 0
	Free pages: 0
	Status of Main DB
	Tree depth: 1
	Branch pages: 0
	Leaf pages: 1
	Overflow pages: 0
	Entries: 50000000

	The test is a bit awkward here too, since it deletes the entries from the middle of the DB. If you were truly expiring records from a time-series database, you would delete from the head of the DB. Deleting in the middle like this leaves a lot of pages half full, instead of totally emptying/freeing pages.