(from original author) For a project I am on I need to use a key-value store to converts file-paths to fixnum IDs. The dataset will typically be in the range of 100 000 to 1 000 000. These tests use 305 000 file paths to fixnum IDs.
The Different Key-Value stores tested are:
Daybreak: "Daybreak is a simple and very fast key value store for ruby" GDBM: GNU dbm. "a simple database engine for storing key-value pairs on disk." DBM: "The DBM class provides a wrapper to a Unix-style dbm or Database Manager library" PStore: "PStore implements a file based persistence mechanism based on a Hash. "
Out of these, all except Daybreak are in the Ruby standard library.
Updated May 03 2017 by wjordan: this test was run on an Intel Core i7-6700 with 32GB ram running Ubuntu 17.04.
#!/usr/bin/env ruby
#benchmarking different DB systems for load of 305_000 file paths.
require "benchmark"
require "fileutils"
require "daybreak"
require "pstore"
require "gdbm"
require "dbm"
require 'benchmark/ips'
COUNT = 305_000
main_path = File.join(Dir.pwd, "test_file")
testdatat = (1..COUNT).map {|e| ["#{main_path}#{e}.pdf", e.to_s]}
def delete_files
begin
FileUtils.rm("testDaybreak.db") if File.file?("testDaybreak.db")
FileUtils.rm("testDaybreak-sync.db") if File.file?("testDaybreak-sync.db")
FileUtils.rm("testGDBM.db") if File.file?("testGDBM.db")
FileUtils.rm("testDBM.db") if File.file?("testDBM.db")
FileUtils.rm("testDBM.dir") if File.file?("testDBM.dir")
FileUtils.rm("testDBM.pag") if File.file?("testDBM.pag")
FileUtils.rm("testPStore.db") if File.file?("testPStore.db")
rescue Exception => e
puts "Error when deleting: "
puts e.message
puts e.backtrace.inspect
end
end
delete_files
class DaybreakWrapper
@store = nil
def initialize(filename = "testDaybreak.db")
@filename = filename
@store = Daybreak::DB.new(filename, serializer: Daybreak::Serializer::None)
end
def []=(key, val)
if @sync
@store.set!(key, val)
else
@store[key] = val
end
end
def [](key)
@store[key]
end
def values
@store.instance_variable_get(:@table).values
end
def keys
@store.keys
end
def delete(key)
@store.delete(key)
end
def compact
@store.compact
end
def stop
@store.close unless @store.closed?
end
def destroy
stop
FileUtils.rm(@filename)
end
def sync_lock
end
def flush
@store.flush
end
end
class GDBMWrapper
@store = nil
def initialize
@store = GDBM.new("testGDBM.db")
end
def []=(key, val)
@store[Marshal.dump(key)] = Marshal.dump(val)
end
def [](key)
Marshal.load(@store[Marshal.dump(key)])
end
def values
@store.values
end
def keys
@store.keys.map {|e| Marshal.load(e)}
end
def delete(key)
@store.delete(Marshal.dump(key))
end
def stop
@store.close unless @store.closed?
end
def destroy
stop
FileUtils.rm("testGDBM.db")
end
def sync_lock
end
end
class DBMWrapper
@store = nil
def initialize
# @store = DBM.open("testDBM", 666, DBM::WRCREAT)
@store = DBM.new("testDBM")
end
def []=(key, val)
@store[key] = val
end
def [](key)
@store[key]
end
def values
@store.values
end
def keys
@store.keys
end
def delete(key)
@store.delete(key)
end
def stop
@store.close unless @store.closed?
end
def destroy
stop
FileUtils.rm("testDBM.db")
end
def sync_lock
end
end
class PStoreWrapper
@store = nil
def initialize
@store = PStore.new("testPStore.db")
end
def []=(key, val)
transaction do
@store[key] = val
end
end
def [](key)
transaction do
@store[key]
end
end
def values
transaction do
@store.roots.map {|e| @store[e]}
end
end
def keys
transaction do
@store.roots
end
end
def delete(key)
transaction do
@store.delete(key)
end
end
def stop
# transaction do
# @store.commit
# end
end
def destroy
# transaction do
# @store.destroy
# end
FileUtils.rm("testPStore.db")
end
def sync_lock
@store.transaction do
yield
end
end
# Public: Creates a transaction. Nested transactions are allowed.
#
# Returns nothing.
def transaction
if @in_transaction
yield
else
@in_transaction = true
begin
sync_lock {yield}
ensure
@in_transaction = false
end
end
end
end
class HashWrapper
@@superhash = {}
def initialize
#@@superhash = {} unless @@superhash
end
def []=(key, val)
@@superhash[key] = val
end
def [](key)
@@superhash[key]
end
def values
@@superhash.values
end
def keys
@@superhash.keys
end
def stop
end
end
# require "pry-byebug"
n = 50000
daybreak = DaybreakWrapper.new
daybreak_sync = DaybreakWrapper.new('testDaybreak-sync.db')
gdbm = GDBMWrapper.new
dbm = DBMWrapper.new
pstore = PStoreWrapper.new
hash = HashWrapper.new
Benchmark.ips do |x|
x.warmup = 0
x.time = 10
x.report("daybreak insert:") {testdatat.each {|v| daybreak[v[0]] = v[1]}}
x.report("day-sync insert:") {testdatat.each {|v| daybreak_sync[v[0]] = v[1]}; daybreak_sync.flush}
x.report("gdbm insert:") {testdatat.each {|v| gdbm[v[0]] = v[1]}}
x.report("dbm insert:") {testdatat.each {|v| dbm[v[0]] = v[1]}}
x.report("PStore insert:") {pstore.transaction {testdatat.each {|v| pstore[v[0]] = v[1]}}}
x.report("hash insert:") {testdatat.each {|v| hash[v[0]] = v[1]}}
x.report("daybreak read: ") {n.times {daybreak[testdatat.sample[0]]}}
x.report("day-sync read: ") {n.times {daybreak_sync[testdatat.sample[0]]}}
x.report("gdbm read: ") {n.times {gdbm[testdatat.sample[0]]}}
x.report("dbm read: ") {n.times {dbm[testdatat.sample[0]]}}
x.report("PStore read: ") {pstore.transaction {n.times {pstore[testdatat.sample[0]]}}}
x.report("hash read: ") {n.times {hash[testdatat.sample[0]]}}
x.report("daybreak keys: ") {raise "Key error in daybreak" unless daybreak.keys.count == COUNT}
x.report("day-sync keys: ") {raise "Key error in daybreak" unless daybreak_sync.keys.count == COUNT}
x.report("gdbm keys: ") {raise "Key error in gdbm" unless gdbm.keys.count == COUNT}
x.report("dbm keys: ") {raise "Key error in dbm" unless dbm.keys.count == COUNT}
x.report("PStore keys: ") {raise "Key error in PStore" unless pstore.keys.count == COUNT}
x.report("hash keys: ") {raise "Key error in hash" unless hash.keys.count == COUNT}
x.report("daybreak values:") {raise "Value error in daybreak" unless daybreak.values.count == COUNT}
x.report("day-sync values:") {raise "Value error in daybreak" unless daybreak_sync.values.count == COUNT}
x.report("gdbm values:") {raise "Value error in gdbm" unless gdbm.values.count == COUNT}
x.report("dbm values:") {raise "Value error in dbm" unless dbm.values.count == COUNT}
x.report("PStore values:") {raise "Value error in PStore" unless pstore.values.count == COUNT}
x.report("hash values:") {raise "Value error in hash" unless hash.values.count == COUNT}
end
puts Benchmark.measure('compact daybreak'){daybreak.compact}.format("%n: %10.2u sec\n")
puts Benchmark.measure('compact day-sync'){daybreak_sync.compact}.format("%n: %10.2u sec\n")
puts Benchmark.measure('stop daybreak'){daybreak.stop}.format("%n: %10.2u sec\n")
puts Benchmark.measure('stop day-sync'){daybreak_sync.stop}.format("%n: %10.2u sec\n")
puts Benchmark.measure('stop gdbm'){gdbm.stop}.format("%n: %10.2u sec\n")
puts Benchmark.measure('stop dbm'){dbm.stop}.format("%n: %10.2u sec\n")
puts Benchmark.measure('stop pstore'){pstore.stop}.format("%n: %10.2u sec\n")
def format_mb(size)
conv = ['b', 'kb', 'mb', 'gb', 'tb', 'pb', 'eb'];
scale = 1024;
ndx=1
if (size < 2*(scale**ndx)) then
return "#{(size)} #{conv[ndx-1]}"
end
size=size.to_f
[2, 3, 4, 5, 6, 7].each do |ndx|
if (size < 2*(scale**ndx)) then
return "#{'%.3f' % (size/(scale**(ndx-1)))} #{conv[ndx-1]}"
end
end
ndx=7
return "#{'%.3f' % (size/(scale**(ndx-1)))} #{conv[ndx-1]}"
end
puts "daybreak file size: #{format_mb(File.size("testDaybreak.db"))}" if File.file?("testDaybreak.db")
puts "day-sync file size: #{format_mb(File.size("testDaybreak-sync.db"))}" if File.file?("testDaybreak-sync.db")
puts "gdbm file size: #{format_mb(File.size("testGDBM.db"))}" if File.file?("testGDBM.db")
puts "dbm file size: #{format_mb(File.size("testDBM.db"))}" if File.file?("testDBM.db")
if File.file?("testDBM.dir") && File.file?("testDBM.pag")
puts "dbm file size: #{format_mb(File.size("testDBM.dir") + File.size("testDBM.pag"))}"
end
puts "PStore file size: #{format_mb(File.size("testPStore.db"))}" if File.file?("testPStore.db")
delete_files
puts "-------------------------------------------------------------"
daybreak insert: 6.313 (±31.7%) i/s - 48.000 in 10.111902s
day-sync insert: 0.178 (± 0.0%) i/s - 2.000 in 11.283191s
gdbm insert: 0.282 (± 0.0%) i/s - 3.000 in 10.988707s
dbm insert: 1.030 (± 0.0%) i/s - 11.000 in 10.683680s
PStore insert: 1.073 (±93.2%) i/s - 9.000 in 10.466471s
hash insert: 11.225 (± 8.9%) i/s - 112.000 in 10.018431s
daybreak read: 23.461 (± 4.3%) i/s - 235.000 in 10.023788s
day-sync read: 23.254 (± 4.3%) i/s - 233.000 in 10.029694s
gdbm read: 2.386 (±41.9%) i/s - 21.000 in 10.093796s
dbm read: 9.623 (± 0.0%) i/s - 95.000 in 9.887328s
PStore read: 1.169 (± 0.0%) i/s - 10.000 in 10.245758s
hash read: 24.070 (± 4.2%) i/s - 241.000 in 10.019297s
daybreak keys: 2.111k (±30.0%) i/s - 3.784k in 9.997594s
day-sync keys: 2.134k (±29.3%) i/s - 3.836k in 10.011280s
gdbm keys: 0.600 (± 0.0%) i/s - 6.000 in 10.723920s
dbm keys: 4.261 (±23.5%) i/s - 39.000 in 10.538305s
PStore keys: 1.134 (± 0.0%) i/s - 10.000 in 10.655226s
hash keys: 2.095k (±30.7%) i/s - 3.780k in 10.021090s
daybreak values: 2.127k (±29.1%) i/s - 3.794k in 10.016437s
day-sync values: 2.086k (±29.9%) i/s - 3.724k in 10.007764s
gdbm values: 0.574 (± 0.0%) i/s - 6.000 in 11.321620s
dbm values: 4.814 (± 0.0%) i/s - 48.000 in 10.009991s
PStore values: 0.998 (± 0.0%) i/s - 10.000 in 11.869013s
hash values: 2.079k (±30.6%) i/s - 4.003k in 10.019779s
compact daybreak: 46.72 sec
compact day-sync: 1.13 sec
stop daybreak: 0.00 sec
stop day-sync: 0.00 sec
stop gdbm: 0.00 sec
stop dbm: 0.00 sec
stop pstore: 0.00 sec
daybreak file size: 19.567 mb
day-sync file size: 19.567 mb
gdbm file size: 35.021 mb
dbm file size: 60.774 mb
PStore file size: 18.695 mb
-
As you can see,
daybreak
seems to be fastest overall when inserts are done asynchronously, but the downside is that flushing all changes to disk takes a while after the test is complete. It uses fairly little disk space after the journal is compacted, but it can take a large amount of space in the meantime. Daybreak, despite being "pure ruby" does not work in windows because it uses some fancy file-locking which is only supported in POSIX. -
dbm
has the issue that the stored file is very dependent on the machine you are on, so if you move it to another machine, it might not read at all. -
Pstore
seems to perform well, however, if the tests were not run inside one
db.transaction, the performance was so bad I had to abort the execution. One point to make though is that
PStore` seems to use fairly little disk space (smallest in this test). -
GDBM
works on all platforms (yes, even Windows) and it seems to be performing well. Keep in mind that it has tomarshal.dump
the values and keys used, and even though that is done it performs fairly well. -
The
hash
times are added for a simple comparison to the (to my knowledge) fastest in-memory alternative. As one would expect, the hash is way faster.
Notes: There are multiple key-value stores left out of this test. This test was meant as a test of some cross-platform alternatives compared to the daybreak
which I wrongly assumed to be cross platform. Feel free to add any ones you feel is missing :)