Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save wjordan/e1e0d735be576e8fcc3385886ff0e32d to your computer and use it in GitHub Desktop.
Save wjordan/e1e0d735be576e8fcc3385886ff0e32d to your computer and use it in GitHub Desktop.
Performance testing different Key-Value stores in Ruby

Introduction

(from original author) For a project I am on I need to use a key-value store to converts file-paths to fixnum IDs. The dataset will typically be in the range of 100 000 to 1 000 000. These tests use 305 000 file paths to fixnum IDs.

The Different Key-Value stores tested are:

Daybreak: "Daybreak is a simple and very fast key value store for ruby" GDBM: GNU dbm. "a simple database engine for storing key-value pairs on disk." DBM: "The DBM class provides a wrapper to a Unix-style dbm or Database Manager library" PStore: "PStore implements a file based persistence mechanism based on a Hash. "

Out of these, all except Daybreak are in the Ruby standard library.

Updated May 03 2017 by wjordan: this test was run on an Intel Core i7-6700 with 32GB ram running Ubuntu 17.04.

Test code:

#!/usr/bin/env ruby
#benchmarking different DB systems for load of 305_000 file paths.

require "benchmark"
require "fileutils"
require "daybreak"
require "pstore"
require "gdbm"
require "dbm"
require 'benchmark/ips'

COUNT = 305_000

main_path = File.join(Dir.pwd, "test_file")
testdatat = (1..COUNT).map {|e| ["#{main_path}#{e}.pdf", e.to_s]}

def delete_files
  begin
    FileUtils.rm("testDaybreak.db") if File.file?("testDaybreak.db")
    FileUtils.rm("testDaybreak-sync.db") if File.file?("testDaybreak-sync.db")
    FileUtils.rm("testGDBM.db") if File.file?("testGDBM.db")
    FileUtils.rm("testDBM.db") if File.file?("testDBM.db")
    FileUtils.rm("testDBM.dir") if File.file?("testDBM.dir")
    FileUtils.rm("testDBM.pag") if File.file?("testDBM.pag")
    FileUtils.rm("testPStore.db") if File.file?("testPStore.db")
  rescue Exception => e
    puts "Error when deleting: "
    puts e.message
    puts e.backtrace.inspect
  end
end

delete_files


class DaybreakWrapper
  @store = nil

  def initialize(filename = "testDaybreak.db")
    @filename = filename
    @store = Daybreak::DB.new(filename, serializer: Daybreak::Serializer::None)
  end

  def []=(key, val)
    if @sync
      @store.set!(key, val)
    else
      @store[key] = val
    end
  end

  def [](key)
    @store[key]
  end

  def values
    @store.instance_variable_get(:@table).values
  end

  def keys
    @store.keys
  end

  def delete(key)
    @store.delete(key)
  end

  def compact
    @store.compact
  end

  def stop
    @store.close unless @store.closed?
  end

  def destroy
    stop
    FileUtils.rm(@filename)
  end

  def sync_lock
  end

  def flush
    @store.flush
  end
end
class GDBMWrapper
  @store = nil

  def initialize
    @store = GDBM.new("testGDBM.db")
  end

  def []=(key, val)
    @store[Marshal.dump(key)] = Marshal.dump(val)
  end

  def [](key)
    Marshal.load(@store[Marshal.dump(key)])
  end

  def values
    @store.values
  end

  def keys
    @store.keys.map {|e| Marshal.load(e)}
  end

  def delete(key)
    @store.delete(Marshal.dump(key))
  end

  def stop
    @store.close unless @store.closed?
  end

  def destroy
    stop
    FileUtils.rm("testGDBM.db")
  end

  def sync_lock
  end
end

class DBMWrapper
  @store = nil

  def initialize
    # @store = DBM.open("testDBM", 666, DBM::WRCREAT)
    @store = DBM.new("testDBM")
  end

  def []=(key, val)
    @store[key] = val
  end

  def [](key)
    @store[key]
  end

  def values
    @store.values
  end

  def keys
    @store.keys
  end

  def delete(key)
    @store.delete(key)
  end

  def stop
    @store.close unless @store.closed?
  end

  def destroy
    stop
    FileUtils.rm("testDBM.db")
  end

  def sync_lock
  end
end

class PStoreWrapper
  @store = nil

  def initialize
    @store = PStore.new("testPStore.db")
  end

  def []=(key, val)
    transaction do
      @store[key] = val
    end
  end

  def [](key)
    transaction do
      @store[key]
    end
  end

  def values
    transaction do
      @store.roots.map {|e| @store[e]}
    end
  end

  def keys
    transaction do
      @store.roots
    end
  end

  def delete(key)
    transaction do
      @store.delete(key)
    end
  end

  def stop
    # transaction do
    # 	@store.commit
    # end
  end

  def destroy
    # transaction do
    # 	@store.destroy
    # end
    FileUtils.rm("testPStore.db")
  end

  def sync_lock
    @store.transaction do
      yield
    end
  end

  # Public: Creates a transaction. Nested transactions are allowed.
  #
  # Returns nothing.
  def transaction
    if @in_transaction
      yield
    else
      @in_transaction = true
      begin
        sync_lock {yield}
      ensure
        @in_transaction = false
      end
    end
  end
end

class HashWrapper
  @@superhash = {}

  def initialize
    #@@superhash = {} unless @@superhash
  end

  def []=(key, val)
    @@superhash[key] = val
  end

  def [](key)
    @@superhash[key]
  end

  def values
    @@superhash.values
  end

  def keys
    @@superhash.keys
  end

  def stop
  end
end

# require "pry-byebug"
n = 50000
daybreak = DaybreakWrapper.new
daybreak_sync = DaybreakWrapper.new('testDaybreak-sync.db')
gdbm = GDBMWrapper.new
dbm = DBMWrapper.new
pstore = PStoreWrapper.new
hash = HashWrapper.new
Benchmark.ips do |x|
  x.warmup = 0
  x.time = 10

  x.report("daybreak insert:") {testdatat.each {|v| daybreak[v[0]] = v[1]}}
  x.report("day-sync insert:") {testdatat.each {|v| daybreak_sync[v[0]] = v[1]}; daybreak_sync.flush}
  x.report("gdbm     insert:") {testdatat.each {|v| gdbm[v[0]] = v[1]}}
  x.report("dbm      insert:") {testdatat.each {|v| dbm[v[0]] = v[1]}}
  x.report("PStore   insert:") {pstore.transaction {testdatat.each {|v| pstore[v[0]] = v[1]}}}
  x.report("hash     insert:") {testdatat.each {|v| hash[v[0]] = v[1]}}

  x.report("daybreak read:  ") {n.times {daybreak[testdatat.sample[0]]}}
  x.report("day-sync read:  ") {n.times {daybreak_sync[testdatat.sample[0]]}}
  x.report("gdbm     read:  ") {n.times {gdbm[testdatat.sample[0]]}}
  x.report("dbm      read:  ") {n.times {dbm[testdatat.sample[0]]}}
  x.report("PStore   read:  ") {pstore.transaction {n.times {pstore[testdatat.sample[0]]}}}
  x.report("hash     read:  ") {n.times {hash[testdatat.sample[0]]}}

  x.report("daybreak keys:  ") {raise "Key error in daybreak" unless daybreak.keys.count == COUNT}
  x.report("day-sync keys:  ") {raise "Key error in daybreak" unless daybreak_sync.keys.count == COUNT}
  x.report("gdbm     keys:  ") {raise "Key error in gdbm" unless gdbm.keys.count == COUNT}
  x.report("dbm      keys:  ") {raise "Key error in dbm" unless dbm.keys.count == COUNT}
  x.report("PStore   keys:  ") {raise "Key error in PStore" unless pstore.keys.count == COUNT}
  x.report("hash     keys:  ") {raise "Key error in hash" unless hash.keys.count == COUNT}

  x.report("daybreak values:") {raise "Value error in daybreak" unless daybreak.values.count == COUNT}
  x.report("day-sync values:") {raise "Value error in daybreak" unless daybreak_sync.values.count == COUNT}
  x.report("gdbm     values:") {raise "Value error in gdbm" unless gdbm.values.count == COUNT}
  x.report("dbm      values:") {raise "Value error in dbm" unless dbm.values.count == COUNT}
  x.report("PStore   values:") {raise "Value error in PStore" unless pstore.values.count == COUNT}
  x.report("hash     values:") {raise "Value error in hash" unless hash.values.count == COUNT}
end
puts Benchmark.measure('compact daybreak'){daybreak.compact}.format("%n: %10.2u sec\n")
puts Benchmark.measure('compact day-sync'){daybreak_sync.compact}.format("%n: %10.2u sec\n")
puts Benchmark.measure('stop daybreak'){daybreak.stop}.format("%n: %10.2u sec\n")
puts Benchmark.measure('stop day-sync'){daybreak_sync.stop}.format("%n: %10.2u sec\n")
puts Benchmark.measure('stop gdbm'){gdbm.stop}.format("%n: %10.2u sec\n")
puts Benchmark.measure('stop dbm'){dbm.stop}.format("%n: %10.2u sec\n")
puts Benchmark.measure('stop pstore'){pstore.stop}.format("%n: %10.2u sec\n")

def format_mb(size)
  conv = ['b', 'kb', 'mb', 'gb', 'tb', 'pb', 'eb'];
  scale = 1024;

  ndx=1
  if (size < 2*(scale**ndx)) then
    return "#{(size)} #{conv[ndx-1]}"
  end
  size=size.to_f
  [2, 3, 4, 5, 6, 7].each do |ndx|
    if (size < 2*(scale**ndx)) then
      return "#{'%.3f' % (size/(scale**(ndx-1)))} #{conv[ndx-1]}"
    end
  end
  ndx=7
  return "#{'%.3f' % (size/(scale**(ndx-1)))} #{conv[ndx-1]}"
end

puts "daybreak file size: #{format_mb(File.size("testDaybreak.db"))}" if File.file?("testDaybreak.db")
puts "day-sync file size: #{format_mb(File.size("testDaybreak-sync.db"))}" if File.file?("testDaybreak-sync.db")
puts "gdbm     file size: #{format_mb(File.size("testGDBM.db"))}" if File.file?("testGDBM.db")
puts "dbm      file size: #{format_mb(File.size("testDBM.db"))}" if File.file?("testDBM.db")
if File.file?("testDBM.dir") && File.file?("testDBM.pag")
  puts "dbm      file size: #{format_mb(File.size("testDBM.dir") + File.size("testDBM.pag"))}"
end
puts "PStore   file size: #{format_mb(File.size("testPStore.db"))}" if File.file?("testPStore.db")

delete_files

puts "-------------------------------------------------------------"

Results

    daybreak insert:      6.313  (±31.7%) i/s -     48.000  in  10.111902s
    day-sync insert:      0.178  (± 0.0%) i/s -      2.000  in  11.283191s
    gdbm     insert:      0.282  (± 0.0%) i/s -      3.000  in  10.988707s
    dbm      insert:      1.030  (± 0.0%) i/s -     11.000  in  10.683680s
    PStore   insert:      1.073  (±93.2%) i/s -      9.000  in  10.466471s
    hash     insert:     11.225  (± 8.9%) i/s -    112.000  in  10.018431s
    daybreak read:       23.461  (± 4.3%) i/s -    235.000  in  10.023788s
    day-sync read:       23.254  (± 4.3%) i/s -    233.000  in  10.029694s
    gdbm     read:        2.386  (±41.9%) i/s -     21.000  in  10.093796s
    dbm      read:        9.623  (± 0.0%) i/s -     95.000  in   9.887328s
    PStore   read:        1.169  (± 0.0%) i/s -     10.000  in  10.245758s
    hash     read:       24.070  (± 4.2%) i/s -    241.000  in  10.019297s
    daybreak keys:        2.111k (±30.0%) i/s -      3.784k in   9.997594s
    day-sync keys:        2.134k (±29.3%) i/s -      3.836k in  10.011280s
    gdbm     keys:        0.600  (± 0.0%) i/s -      6.000  in  10.723920s
    dbm      keys:        4.261  (±23.5%) i/s -     39.000  in  10.538305s
    PStore   keys:        1.134  (± 0.0%) i/s -     10.000  in  10.655226s
    hash     keys:        2.095k (±30.7%) i/s -      3.780k in  10.021090s
    daybreak values:      2.127k (±29.1%) i/s -      3.794k in  10.016437s
    day-sync values:      2.086k (±29.9%) i/s -      3.724k in  10.007764s
    gdbm     values:      0.574  (± 0.0%) i/s -      6.000  in  11.321620s
    dbm      values:      4.814  (± 0.0%) i/s -     48.000  in  10.009991s
    PStore   values:      0.998  (± 0.0%) i/s -     10.000  in  11.869013s
    hash     values:      2.079k (±30.6%) i/s -      4.003k in  10.019779s

compact daybreak: 46.72 sec
compact day-sync:  1.13 sec
stop daybreak:     0.00 sec
stop day-sync:     0.00 sec
stop gdbm:         0.00 sec
stop dbm:          0.00 sec
stop pstore:       0.00 sec

daybreak file size: 19.567 mb
day-sync file size: 19.567 mb
gdbm     file size: 35.021 mb
dbm      file size: 60.774 mb
PStore   file size: 18.695 mb
  • As you can see, daybreak seems to be fastest overall when inserts are done asynchronously, but the downside is that flushing all changes to disk takes a while after the test is complete. It uses fairly little disk space after the journal is compacted, but it can take a large amount of space in the meantime. Daybreak, despite being "pure ruby" does not work in windows because it uses some fancy file-locking which is only supported in POSIX.

  • dbm has the issue that the stored file is very dependent on the machine you are on, so if you move it to another machine, it might not read at all.

  • Pstoreseems to perform well, however, if the tests were not run inside onedb.transaction, the performance was so bad I had to abort the execution. One point to make though is that PStore` seems to use fairly little disk space (smallest in this test).

  • GDBM works on all platforms (yes, even Windows) and it seems to be performing well. Keep in mind that it has to marshal.dump the values and keys used, and even though that is done it performs fairly well.

  • The hash times are added for a simple comparison to the (to my knowledge) fastest in-memory alternative. As one would expect, the hash is way faster.

Notes: There are multiple key-value stores left out of this test. This test was meant as a test of some cross-platform alternatives compared to the daybreak which I wrongly assumed to be cross platform. Feel free to add any ones you feel is missing :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment