Skip to content

Instantly share code, notes, and snippets.

@Bodacious
Last active March 16, 2024 23:01
Show Gist options
  • Save Bodacious/568248cf1d1453b96e0e50aea8664856 to your computer and use it in GitHub Desktop.
Save Bodacious/568248cf1d1453b96e0e50aea8664856 to your computer and use it in GitHub Desktop.
Benchmarking different SEMVER ordering strategies in PostgreSQL

Semantic versioning DB index strategy

What's the most performant way of storing and retrieving semantic versioning strings in a DB, where records are ordered by their semantic version string.

Strategy A: Indexed String

A simple string column with an index. Will only return the records in the correct order if broken into an array—unlikely to use the index.

Strategy B: Split (composite) index on maj, min, patch

Splits the version string into its composite parts on separate DB columns. Indexes them as a composite index (maj, min, patch)

Strategy C: Split (composite) index on patch, min, maj

Same concept as B, but reverses the order (testing increased cardinality).

Strategy D: Integer Array

Splits the version string into its composite parts and stores them in a separate column of type Array

Strategy E: Integer Array (indexed)

Same as D, but with an index on the column

Strategy F: Padded Integer

Stores the version number as an integer on the database. Zero-pads the composite parts, so that the original values can be inferred from the integer (e.g. 1.23.2 => 001023002). Should be fast, but requires additional attribute type in the Rails model

require 'bundler/inline'
##
# Get dependencies
gemfile do
source 'https://rubygems.org'
gem 'pg'
gem 'activerecord', require: 'active_record'
gem 'benchmark-ips'
gem 'pry'
gem 'minitest'
end
##
# Ensure DB exists
ActiveRecord::Base.yield_self do |base|
base.establish_connection(
adapter: 'postgresql',
host: 'localhost',
database: 'postgres'
)
begin
base.connection.create_database('test_stuff')
rescue ActiveRecord::DatabaseAlreadyExists
nil
end
base.establish_connection(
adapter: 'postgresql',
host: 'localhost',
database: 'test_stuff'
)
end
##
# Define the schema
ActiveRecord::Schema.define do
create_table('side_a', id: :bigint, primary_key: :id, force: true) do |t|
t.string 'name'
t.string 'version'
end
create_table('side_b', id: :bigint, primary_key: :id, force: true) do |t|
t.string 'name'
t.string 'version'
t.integer 'version_major'
t.integer 'version_minor'
t.integer 'version_patch'
end
create_table('side_c', id: :bigint, primary_key: :id, force: true) do |t|
t.string 'name'
t.string 'version'
t.integer 'version_major'
t.integer 'version_minor'
t.integer 'version_patch'
end
create_table('side_d', id: :bigint, primary_key: :id, force: true) do |t|
t.string 'name'
t.string 'version'
t.integer 'version_arr', array: true
end
create_table('side_e', id: :bigint, primary_key: :id, force: true) do |t|
t.string 'name'
t.string 'version'
t.integer 'version_arr', array: true
end
create_table('side_f', id: :bigint, primary_key: :id, force: true) do |t|
t.string 'name'
t.integer 'version'
end
add_index :side_a, :version
add_index :side_b, [:version_major, :version_minor, :version_patch], name: :version
add_index :side_c, [:version_patch, :version_minor, :version_major], name: :version_rev
add_index :side_e, [:version_arr]
add_index :side_f, :version
end
class SideA < ActiveRecord::Base
self.table_name = 'side_a'
scope :ordered, -> { order(Arel.sql("string_to_array(version, '.')::int[]")) }
end
class SideB < ActiveRecord::Base
self.table_name = 'side_b'
scope :ordered, -> { order('version_major ASC, version_minor ASC, version_patch ASC') }
def version=(value)
super
self.version_major, self.version_minor, self.version_patch = *value.split('.')
end
end
class SideC < ActiveRecord::Base
self.table_name = 'side_c'
scope :ordered, -> { order('version_major ASC, version_minor ASC, version_patch ASC') }
def version=(value)
super
self.version_major, self.version_minor, self.version_patch = *value.split('.')
end
end
class SideD < ActiveRecord::Base
self.table_name = 'side_d'
scope :ordered, -> { order('version_arr ASC') }
def version=(value)
super
self.version_arr = *value.split('.')
end
end
class SideE < ActiveRecord::Base
self.table_name = 'side_e'
scope :ordered, -> { order('version_arr ASC') }
def version=(value)
super
self.version_arr = *value.split('.')
end
end
##
# Define a custom attribute type to encode Semver Strings as zero-padded integers
# e.g. "1.23.56" => 001023056
class SemverType < ActiveRecord::Type::Value
def serialize(value)
return value if value.is_a?(Integer)
value.to_s.split('.').map { _1.rjust(3, '0') }.join.to_i
end
def deserialize(value)
value.to_s.rjust(9, '0').chars.each_slice(3).map(&:join).map(&:to_i).join('.')
end
def type
:integer
end
def assert_valid_value(value)
value.to_s.match(/\d{1,3}\.\d{1,3}\.\d{1,3}/)
end
end
ActiveRecord::Type.register(:semver, SemverType)
class SideF < ActiveRecord::Base
self.table_name = 'side_f'
attribute :version, :semver
scope :ordered, -> { order('version') }
end
##
# Build a bunch of fake data
maj_versions = (0..2).to_a
min_versions = (0..12).to_a
patch_versions = (0..15).to_a
all_version_strings = maj_versions.product(min_versions, patch_versions).map { _1.join('.') }
document_names = 10.times.map.with_index(1) { "Document #{_2}" }
names_and_versions = document_names.product(all_version_strings)
attributes = names_and_versions.map do |(name, version_string)|
{
name: name,
version: version_string,
}
end
##
# Populate the tables
puts "populating side_a"
SideA.create(attributes)
puts "populating side_b"
SideB.create(attributes)
puts "populating side_c"
SideC.create(attributes)
puts "populating side_d"
SideD.create(attributes)
puts "populating side_e"
SideE.create(attributes)
puts "populating side_f"
SideF.create(attributes)
work = -> (klass) {
klass.ordered.limit(100).load
klass.ordered.count
}
##
# Run the benchmark
Benchmark.ips do |test|
test.report('A: Indexed String') { work.(SideA) }
test.report('B: Split index maj,min,patch') { work.(SideB) }
test.report('C: Split index patch,min,max') { work.(SideC) }
test.report('D: Integer Array') { work.(SideD) }
test.report('E: Integer Array indexed') { work.(SideE) }
test.report('F: Padded integer') { work.(SideF) }
end
ActiveRecord::Base.connection.yield_self do |connection|
connection.disconnect!
end
Warming up --------------------------------------
A: Indexed String 9.000 i/100ms
B: Split index maj,min,patch
51.000 i/100ms
C: Split index patch,min,max
29.000 i/100ms
D: Integer Array 26.000 i/100ms
E: Integer Array indexed
53.000 i/100ms
F: Padded integer 56.000 i/100ms
Calculating -------------------------------------
A: Indexed String 93.109 (± 3.2%) i/s - 468.000 in 5.031195s
B: Split index maj,min,patch
529.138 (± 5.5%) i/s - 2.652k in 5.027388s
C: Split index patch,min,max
295.089 (± 5.8%) i/s - 1.479k in 5.029322s
D: Integer Array 260.080 (± 6.2%) i/s - 1.300k in 5.019239s
E: Integer Array indexed
536.049 (± 5.6%) i/s - 2.703k in 5.059193s
F: Padded integer 555.127 (± 5.9%) i/s - 2.800k in 5.063651s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment