Skip to content

Instantly share code, notes, and snippets.

@hyc
Last active July 20, 2017 11:19
Show Gist options
  • Save hyc/6e00907a8a64be805fb5 to your computer and use it in GitHub Desktop.
Save hyc/6e00907a8a64be805fb5 to your computer and use it in GitHub Desktop.
More fun with InfluxDB
I believe this is directly comparable to the results published here
http://influxdb.com/blog/2014/06/20/leveldb_vs_rocksdb_vs_hyperleveldb_vs_lmdb_performance.html
My laptop has 8GB of RAM but I pared it down to 4GB by turning off swap and creating a large enough file
in tmpfs to drop free RAM down to 4GB.
This is the code prior to using Sorted Duplicates.
The RocksDB performance is amazingly poor.
violino:/home/software/influxdb> /usr/bin/time -v ./benchmark-storage -path=/home/test/db -points=100000000 -series=500000
################ Benchmarking: lmdb
Writing 100000000 points in batches of 1000 points took 10m0.973944912s (6.009739 microsecond per point)
Querying 100000000 points took 4m0.062210717s (2.400622 microseconds per point)
Size: 6.1G
Took 1m19.017894891s to delete 50000000 points
Took 2.095us to compact
Querying 50000000 points took 1m29.304574146s (1.786091 microseconds per point)
Size: 7.6G
Writing 50000000 points in batches of 1000 points took 9m36.931839789s (11.538637 microsecond per point)
Size: 7.7G
################ Benchmarking: leveldb
Writing 100000000 points in batches of 1000 points took 39m50.903262204s (23.909033 microsecond per point)
Querying 100000000 points took 2m49.339779425s (1.693398 microseconds per point)
Size: 2.7G
Took 5m48.831738377s to delete 50000000 points
Took 6m17.357548286s to compact
Querying 50000000 points took 1m0.168453865s (1.203369 microseconds per point)
Size: 1.4G
Writing 50000000 points in batches of 1000 points took 16m14.040395323s (19.480808 microsecond per point)
Size: 2.6G
################ Benchmarking: rocksdb
Writing 100000000 points in batches of 1000 points took 3h25m10.762258086s (123.107623 microsecond per point)
Querying 100000000 points took 2m26.217626808s (1.462176 microseconds per point)
Size: 37G
Took 8m45.677135051s to delete 50000000 points
Took 2m55.372818028s to compact
Querying 50000000 points took 1m1.570714964s (1.231414 microseconds per point)
Size: 37G
Writing 50000000 points in batches of 1000 points took 2h1m51.42641092s (146.228528 microsecond per point)
Size: 58G
################ Benchmarking: hyperleveldb
Writing 100000000 points in batches of 1000 points took 9m9.924859094s (5.499249 microsecond per point)
Querying 100000000 points took 9m32.667573668s (5.726676 microseconds per point)
Size: 3.3G
Took 5m47.830141963s to delete 50000000 points
Took 6m39.712762331s to compact
Querying 50000000 points took 1m22.704782776s (1.654096 microseconds per point)
Size: 1.6G
Writing 50000000 points in batches of 1000 points took 4m24.807726459s (5.296155 microsecond per point)
Size: 3.5G
Command being timed: "./benchmark-storage -path=/home/test/db -points=100000000 -series=500000"
User time (seconds): 22667.93
System time (seconds): 6365.04
Percent of CPU this job got: 98%
Elapsed (wall clock) time (h:mm:ss or m:ss): 8:12:30
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 27072656
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 1177869
Minor (reclaiming a frame) page faults: 90486770
Voluntary context switches: 669563529
Involuntary context switches: 14246002
Swaps: 0
File system inputs: 63595816
File system outputs: 590122424
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
This was another run on the same machine, same code. Strangely, RocksDB's result is completely different this time, crashing the program.
violino:/home/software/influxdb> /usr/bin/time -v ./benchmark-storage -path=/home/test/db -points=100000000 -series=500000
################ Benchmarking: lmdb
Writing 100000000 points in batches of 1000 points took 12m14.22774633s (7.342277 microsecond per point)
Querying 100000000 points took 6m1.96092097s (3.619609 microseconds per point)
Size: 7.6G
Took 6m23.96630963s to delete 50000000 points
Took 1.048us to compact
Querying 50000000 points took 4m0.083501501s (4.801670 microseconds per point)
Size: 7.6G
Writing 50000000 points in batches of 1000 points took 1h28m28.45283235s (106.169057 microsecond per point)
Size: 8.2G
################ Benchmarking: leveldb
Writing 100000000 points in batches of 1000 points took 39m32.39700747s (23.723970 microsecond per point)
Querying 100000000 points took 3m6.89910029s (1.868991 microseconds per point)
Size: 2.7G
Took 5m39.404872895s to delete 50000000 points
Took 6m14.918991943s to compact
Querying 50000000 points took 1m0.488077474s (1.209762 microseconds per point)
Size: 1.4G
Writing 50000000 points in batches of 1000 points took 16m28.047675968s (19.760954 microsecond per point)
Size: 2.6G
################ Benchmarking: rocksdb
Writing 100000000 points in batches of 1000 points took 3h45m57.166233904s (135.571662 microsecond per point)
Querying 100000000 points took 3m3.470915689s (1.834709 microseconds per point)
Size: 41G
Took 8m33.237626533s to delete 50000000 points
Took 3m47.826396787s to compact
Querying 50000000 points took 51.101206202s (1.022024 microseconds per point)
Size: 41G
Writing 50000000 points in batches of 1000 points took 2h55m7.684545292s (210.153691 microsecond per point)
panic: exit status 1
goroutine 1 [running]:
runtime.panic(0x7cfc80, 0xc2100d3778)
/usr/local/go/src/pkg/runtime/panic.c:266 +0xb6
main.getSize(0xc210117b80, 0x1a, 0x1a, 0x4)
/home/software/influxdb/src/tools/benchmark-storage/main.go:54 +0x130
main.benchmarkDbCommon(0x7f9114265198, 0xc21001fc00, 0x5f5e100, 0x3e8, 0x7a120, ...)
/home/software/influxdb/src/tools/benchmark-storage/main.go:97 +0x811
main.benchmark(0x7eb180, 0x7, 0x5f5e100, 0x3e8, 0x7a120, ...)
/home/software/influxdb/src/tools/benchmark-storage/main.go:47 +0x269
main.main()
/home/software/influxdb/src/tools/benchmark-storage/main.go:32 +0x454
goroutine 3 [chan receive]:
code.google.com/p/log4go.ConsoleLogWriter.run(0xc2100492c0, 0x7f9114255fe8, 0xc210000008)
/home/software/influxdb/src/code.google.com/p/log4go/termlog.go:27 +0x60
created by code.google.com/p/log4go.NewConsoleLogWriter
/home/software/influxdb/src/code.google.com/p/log4go/termlog.go:19 +0x67
goroutine 4 [syscall]:
runtime.goexit()
/usr/local/go/src/pkg/runtime/proc.c:1394
goroutine 6 [finalizer wait]:
runtime.park(0x5d8210, 0xc68380, 0xc571a8)
/usr/local/go/src/pkg/runtime/proc.c:1342 +0x66
runfinq()
/usr/local/go/src/pkg/runtime/mgc0.c:2279 +0x84
runtime.goexit()
/usr/local/go/src/pkg/runtime/proc.c:1394
Command exited with non-zero status 2
Command being timed: "./benchmark-storage -path=/home/test/db -points=100000000 -series=500000"
User time (seconds): 21330.49
System time (seconds): 8025.84
Percent of CPU this job got: 78%
Elapsed (wall clock) time (h:mm:ss or m:ss): 10:21:55
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 17911936
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 22018495
Minor (reclaiming a frame) page faults: 52586768
Voluntary context switches: 884225141
Involuntary context switches: 12885920
Swaps: 0
File system inputs: 401123272
File system outputs: 747221368
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 2
This is the LMDB result using the Sorted Duplicates patches
https://github.com/influxdb/influxdb/pull/678
I don't have the RocksDB result yet, it will be several more hours before that finishes.
... updated, finally finished
violino:/home/software/influxdb> ./benchmark-storage -path=/home/test/db -points=100000000 -series=500000
################ Benchmarking: lmdb
Writing 100000000 points in batches of 1000 points took 9m55.060557828s (5.950606 microsecond per point)
Querying 100000000 points took 1m26.153283998s (0.861533 microseconds per point)
Size: 4.0G
Took 1m11.705748913s to delete 50000000 points
Took 1.257us to compact
Querying 50000000 points took 43.994534804s (0.879891 microseconds per point)
Size: 4.0G
Writing 50000000 points in batches of 1000 points took 5m32.398039417s (6.647961 microsecond per point)
Size: 5.9G
################ Benchmarking: leveldb
Writing 100000000 points in batches of 1000 points took 40m8.701727125s (24.087017 microsecond per point)
Querying 100000000 points took 3m39.413232183s (2.194132 microseconds per point)
Size: 2.7G
Took 17m48.421502672s to delete 50000000 points
Took 6m13.689504673s to compact
Querying 50000000 points took 1m1.125226854s (1.222505 microseconds per point)
Size: 1.4G
Writing 50000000 points in batches of 1000 points took 16m21.570047473s (19.631401 microsecond per point)
Size: 2.6G
################ Benchmarking: rocksdb
Writing 100000000 points in batches of 1000 points took 3h10m25.346725469s (114.253467 microsecond per point)
Querying 100000000 points took 2m26.002405473s (1.460024 microseconds per point)
Size: 35G
Took 16m40.54319908s to delete 50000000 points
Took 3m3.3481798s to compact
Querying 50000000 points took 58.448312524s (1.168966 microseconds per point)
Size: 36G
Writing 50000000 points in batches of 1000 points took 2h11m27.871520367s (157.757430 microsecond per point)
Size: 59G
################ Benchmarking: hyperleveldb
Writing 100000000 points in batches of 1000 points took 9m10.276314813s (5.502763 microsecond per point)
Querying 100000000 points took 12m8.949611018s (7.289496 microseconds per point)
Size: 3.3G
Took 5m11.934801159s to delete 50000000 points
Took 10m31.038632478s to compact
Querying 50000000 points took 1m24.106956728s (1.682139 microseconds per point)
Size: 1.6G
Writing 50000000 points in batches of 1000 points took 4m30.184909667s (5.403698 microsecond per point)
Size: 3.4G
LMDB doesn't have the plethora of complex tuning APIs that other databases do, but it *does* have some worthwhile
data access features that other databases don't. Learning to use them correctly is well worth the trouble.
violino:~/OD/mdb/libraries/liblmdb> ls -l /home/test/db/
total 119
drwxr-xr-x 2 hyc hyc 18904 Jun 22 22:31 test-hyperleveldb
drwxr-xr-x 2 hyc hyc 43128 Jun 22 15:56 test-leveldb
drwxr-xr-x 2 hyc hyc 96 Jun 22 14:01 test-lmdb
drwxr-xr-x 2 hyc hyc 60112 Jun 22 21:43 test-rocksdb
violino:~/OD/mdb/libraries/liblmdb> du !$
du /home/test/db/
3568158 /home/test/db/test-hyperleveldb
6084152 /home/test/db/test-lmdb
61190722 /home/test/db/test-rocksdb
2689903 /home/test/db/test-leveldb
73532934 /home/test/db/
The data files were not touched after running the test. You can see that LevelDB didn't finish until almost 2 hours after the LMDB test ended, the RocksDB test ended almost 6 hours after LevelDB ended, and HyperLevelDB took about 45 minutes after that.
Added a new -c (compact) option to mdb_copy, which copies the DB sequentially, omitting freed/deleted pages.
Starting with the usual run, interrupt it after half the records are deleted:
violino:/home/software/influxdb> /usr/bin/time -v ./benchmark-storage -path=/home/test/db -points=100000000 -series=500000
################ Benchmarking: lmdb
Writing 100000000 points in batches of 1000 points took 10m18.8538945s (6.188539 microsecond per point)
Querying 100000000 points took 1m28.581634191s (0.885816 microseconds per point)
Size: 4.0G
Took 1m13.593047399s to delete 50000000 points
Took 1.118us to compact
^CCommand exited with non-zero status 2
Command being timed: "./benchmark-storage -path=/home/test/db -points=100000000 -series=500000"
User time (seconds): 845.65
System time (seconds): 74.44
Percent of CPU this job got: 104%
Elapsed (wall clock) time (h:mm:ss or m:ss): 14:42.83
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 16497568
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 51
Minor (reclaiming a frame) page faults: 7521009
Voluntary context switches: 6036856
Involuntary context switches: 87408
Swaps: 0
File system inputs: 12152
File system outputs: 60187944
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 2
Then check how compaction behaves:
violino:~/OD/mdb/libraries/liblmdb> /usr/bin/time -v ./mdb_copy -c /home/test/db/test-lmdb/ /home/test/db/x
Command being timed: "./mdb_copy -c /home/test/db/test-lmdb/ /home/test/db/x"
User time (seconds): 1.56
System time (seconds): 6.23
Percent of CPU this job got: 10%
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:15.60
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 16255568
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 1
Minor (reclaiming a frame) page faults: 1016071
Voluntary context switches: 12714
Involuntary context switches: 1924
Swaps: 0
File system inputs: 600
File system outputs: 8141848
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
violino:~/OD/mdb/libraries/liblmdb> du /home/test/db
4065106 /home/test/db/x
4128788 /home/test/db/test-lmdb
8193894 /home/test/db
Doesn't make a huge difference in space, since only ~14,000 freed pages were in the DB to begin with:
violino:~/OD/mdb/libraries/liblmdb> ./mdb_stat -ef /home/test/db/test-lmdb/
Environment Info
Map address: (nil)
Map size: 10737418240
Page size: 4096
Max pages: 2621440
Number of pages used: 1029636
Last transaction ID: 600001
Max readers: 126
Number of readers used: 0
Freelist Status
Tree depth: 2
Branch pages: 1
Leaf pages: 40
Overflow pages: 0
Entries: 1395
Free pages: 14310
Status of Main DB
Tree depth: 1
Branch pages: 0
Leaf pages: 1
Overflow pages: 0
Entries: 50000000
violino:~/OD/mdb/libraries/liblmdb> ./mdb_stat -ef /home/test/db/x
Environment Info
Map address: (nil)
Map size: 10737418240
Page size: 4096
Max pages: 2621440
Number of pages used: 1015285
Last transaction ID: 1
Max readers: 126
Number of readers used: 0
Freelist Status
Tree depth: 0
Branch pages: 0
Leaf pages: 0
Overflow pages: 0
Entries: 0
Free pages: 0
Status of Main DB
Tree depth: 1
Branch pages: 0
Leaf pages: 1
Overflow pages: 0
Entries: 50000000
The test is a bit awkward here too, since it deletes the entries from the middle of the DB. If you were truly expiring records from a time-series database, you would delete from the head of the DB. Deleting in the middle like this leaves a lot of pages half full, instead of totally emptying/freeing pages.
@nodtem66
Copy link

Thanks for sharing benchmarks. I'm looking for Influx optimization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment