Skip to content

Instantly share code, notes, and snippets.

@nevans
Last active March 24, 2021 17:27
Show Gist options
  • Save nevans/6473e9fe1bf95930c5382d86db863b5b to your computer and use it in GitHub Desktop.
Save nevans/6473e9fe1bf95930c5382d86db863b5b to your computer and use it in GitHub Desktop.
Benchmarks for some pure ruby priority queue implementations
# Run on an Intel(R) Core(TM) i7-1065G7
# using ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-linux]
# n.b. when running these benchmarks, I did not shut down other
# processes (including web browser) nor did I run the benchmark
# multiple times (using "--repeat-count"). The numbers would be
# slightly faster and more stable/reliable if I'd done that.
# Maybe (maybe) I'll re-run them later.
Warming up --------------------------------------
pushpop w/ size=3,162,278 256.112k i/s - 263.417k times in 1.028524s (3.90μs/i)
pushpop w/ size=1,000,000 283.858k i/s - 293.040k times in 1.032346s (3.52μs/i)
pushpop w/ size= 316,228 286.535k i/s - 316.712k times in 1.105318s (3.49μs/i)
pushpop w/ size= 100,000 291.138k i/s - 294.530k times in 1.011649s (3.43μs/i)
pushpop w/ size= 31,623 314.217k i/s - 335.852k times in 1.068854s (3.18μs/i)
pushpop w/ size= 10,000 326.015k i/s - 333.608k times in 1.023290s (3.07μs/i)
pushpop w/ size= 3,162 381.708k i/s - 405.600k times in 1.062592s (2.62μs/i)
pushpop w/ size= 1,000 441.730k i/s - 458.604k times in 1.038199s (2.26μs/i)
pushpop w/ size= 316 459.527k i/s - 477.636k times in 1.039407s (2.18μs/i)
pushpop w/ size= 100 622.711k i/s - 672.425k times in 1.079835s (1.61μs/i)
pushpop w/ size= 32 751.908k i/s - 774.423k times in 1.029944s (1.33μs/i)
pushpop w/ size= 10 1.085M i/s - 1.147M times in 1.057044s (921.95ns/i)
Calculating -------------------------------------
Timers gem BSearch Rb2Heap Rb4Heap
push N (N = 3,162,278) 3.416M ERROR 3.873M 5.206M i/s - 6.325M times in 1.851497s 0.000000s 1.633142s 1.214862s
push N (N = 1,000,000) 3.411M ERROR 4.288M 5.170M i/s - 6.000M times in 1.759272s 0.000000s 1.399344s 1.160469s
push N (N = 316,228) 2.981M ERROR 4.277M 5.402M i/s - 6.325M times in 2.121835s 0.000000s 1.478898s 1.170712s
push N (N = 100,000) 3.416M 200.902k 4.313M 5.529M i/s - 6.000M times in 1.756650s 29.865273s 1.391108s 1.085232s
push N (N = 31,623) 3.462M 563.896k 4.417M 5.863M i/s - 6.325M times in 1.826679s 11.215906s 1.431938s 1.078645s
push N (N = 10,000) 3.346M 1.117M 4.225M 5.560M i/s - 6.000M times in 1.793187s 5.370607s 1.420072s 1.079041s
push N (N = 3,162) 3.432M 1.588M 4.463M 5.997M i/s - 6.324M times in 1.842489s 3.982931s 1.417132s 1.054586s
push N (N = 1,000) 3.423M 1.775M 4.376M 5.855M i/s - 6.000M times in 1.752874s 3.379436s 1.371119s 1.024803s
push N (N = 316) 3.519M 2.236M 4.327M 5.863M i/s - 6.320M times in 1.795894s 2.826953s 1.460678s 1.077881s
push N (N = 100) 3.560M 2.635M 4.623M 5.806M i/s - 6.000M times in 1.685267s 2.276668s 1.297843s 1.033357s
push N (N = 32) 3.923M 3.372M 4.959M 5.938M i/s - 6.400M times in 1.631234s 1.898139s 1.290469s 1.077737s
push N (N = 10) 4.227M 4.258M 5.516M 6.313M i/s - 6.000M times in 1.419456s 1.409042s 1.087831s 0.950421s
pushpop w/ size=3,162,278 255.441k ERROR 337.429k 369.298k i/s - 768.334k times in 3.007869s 0.000000s 2.277026s 2.080526s
pushpop w/ size=1,000,000 270.840k ERROR 380.666k 376.737k i/s - 851.574k times in 3.144193s 0.000000s 2.237061s 2.260393s
pushpop w/ size= 316,228 267.331k 31.958k 393.732k 450.095k i/s - 859.604k times in 3.215501s 26.898024s 2.183222s 1.909827s
pushpop w/ size= 100,000 290.206k 228.682k 434.168k 459.020k i/s - 873.415k times in 3.009642s 3.819340s 2.011696s 1.902784s
pushpop w/ size= 31,623 310.594k 956.823k 456.219k 507.833k i/s - 942.650k times in 3.034994s 0.985187s 2.066222s 1.856220s
pushpop w/ size= 10,000 312.430k 1.528M 499.221k 570.330k i/s - 978.044k times in 3.130438s 0.640248s 1.959139s 1.714873s
pushpop w/ size= 3,162 366.672k 1.789M 567.663k 665.152k i/s - 1.145M times in 3.123019s 0.639921s 2.017259s 1.721598s
pushpop w/ size= 1,000 447.877k 1.984M 645.278k 785.424k i/s - 1.325M times in 2.958826s 0.668052s 2.053672s 1.687230s
pushpop w/ size= 316 486.638k 2.074M 724.439k 938.971k i/s - 1.379M times in 2.832870s 0.664546s 1.902964s 1.468182s
pushpop w/ size= 100 636.384k 2.566M 926.443k 949.502k i/s - 1.868M times in 2.935543s 0.727914s 2.016456s 1.967486s
pushpop w/ size= 32 763.645k 2.776M 1.103M 1.218M i/s - 2.256M times in 2.953889s 0.812695s 2.044571s 1.852504s
pushpop w/ size= 10 1.111M 3.614M 1.604M 1.617M i/s - 3.254M times in 2.927673s 0.900466s 2.028385s 2.012949s
push N, pop N (N=3,162,278) 553.322k ERROR 745.647k 812.798k i/s - 6.325M times in 11.430155s 0.000000s 8.481966s 7.781211s
push N, pop N (N=1,000,000) 631.875k ERROR 880.822k 905.588k i/s - 6.000M times in 9.495550s 0.000000s 6.811817s 6.625528s
push N, pop N (N= 316,228) 738.142k ERROR 980.327k 1.001M i/s - 6.325M times in 8.568217s 0.000000s 6.451480s 6.318905s
push N, pop N (N= 100,000) 834.547k 401.825k 1.095M 1.108M i/s - 6.000M times in 7.189532s 14.931876s 5.478814s 5.413644s
push N, pop N (N= 31,623) 904.562k 1.071M 1.227M 1.261M i/s - 6.325M times in 6.991890s 5.906839s 5.155305s 5.015108s
push N, pop N (N= 10,000) 1.026M 2.156M 1.391M 1.355M i/s - 6.000M times in 5.848943s 2.782963s 4.313542s 4.426944s
push N, pop N (N= 3,162) 1.153M 2.856M 1.537M 1.565M i/s - 6.324M times in 5.483464s 2.214409s 4.114142s 4.040375s
push N, pop N (N= 1,000) 1.315M 3.344M 1.757M 1.739M i/s - 6.000M times in 4.563054s 1.794399s 3.415147s 3.450763s
push N, pop N (N= 316) 1.556M 3.829M 2.015M 1.986M i/s - 6.320M times in 4.062785s 1.650756s 3.136332s 3.182902s
push N, pop N (N= 100) 1.905M 4.575M 2.382M 2.443M i/s - 6.000M times in 3.149714s 1.311386s 2.519311s 2.455555s
push N, pop N (N= 32) 2.402M 5.145M 2.936M 2.991M i/s - 6.400M times in 2.664925s 1.243916s 2.180104s 2.139775s
push N, pop N (N= 10) 3.305M 6.167M 3.921M 3.922M i/s - 6.000M times in 1.815305s 0.972956s 1.530235s 1.529652s
Comparison:
push N (N = 3,162,278)
Rb4Heap: 5205985.6 i/s
Rb2Heap: 3872629.7 i/s - 1.34x slower
Timers gem: 3415913.9 i/s - 1.52x slower
BSearch: 0.0 i/s
push N (N = 1,000,000)
Rb4Heap: 5170325.4 i/s
Rb2Heap: 4287723.7 i/s - 1.21x slower
Timers gem: 3410501.9 i/s - 1.52x slower
BSearch: 0.0 i/s
push N (N = 316,228)
Rb4Heap: 5402320.5 i/s
Rb2Heap: 4276535.2 i/s - 1.26x slower
Timers gem: 2980703.3 i/s - 1.81x slower
BSearch: 0.0 i/s
push N (N = 100,000)
Rb4Heap: 5528770.6 i/s
Rb2Heap: 4313109.2 i/s - 1.28x slower
Timers gem: 3415591.6 i/s - 1.62x slower
BSearch: 200902.2 i/s - 27.52x slower
push N (N = 31,623)
Rb4Heap: 5863469.5 i/s
Rb2Heap: 4416811.1 i/s - 1.33x slower
Timers gem: 3462349.2 i/s - 1.69x slower
BSearch: 563895.6 i/s - 10.40x slower
push N (N = 10,000)
Rb4Heap: 5560494.7 i/s
Rb2Heap: 4225137.8 i/s - 1.32x slower
Timers gem: 3345997.2 i/s - 1.66x slower
BSearch: 1117192.2 i/s - 4.98x slower
push N (N = 3,162)
Rb4Heap: 5996667.1 i/s
Rb2Heap: 4462533.0 i/s - 1.34x slower
Timers gem: 3432313.5 i/s - 1.75x slower
BSearch: 1587775.5 i/s - 3.78x slower
push N (N = 1,000)
Rb4Heap: 5854782.3 i/s
Rb2Heap: 4375989.0 i/s - 1.34x slower
Timers gem: 3422949.9 i/s - 1.71x slower
BSearch: 1775444.3 i/s - 3.30x slower
push N (N = 316)
Rb4Heap: 5863357.7 i/s
Rb2Heap: 4326757.3 i/s - 1.36x slower
Timers gem: 3519138.1 i/s - 1.67x slower
BSearch: 2235622.4 i/s - 2.62x slower
push N (N = 100)
Rb4Heap: 5806319.8 i/s
Rb2Heap: 4623056.5 i/s - 1.26x slower
Timers gem: 3560267.2 i/s - 1.63x slower
BSearch: 2635430.5 i/s - 2.20x slower
push N (N = 32)
Rb4Heap: 5938370.7 i/s
Rb2Heap: 4959436.4 i/s - 1.20x slower
Timers gem: 3923409.4 i/s - 1.51x slower
BSearch: 3371723.6 i/s - 1.76x slower
push N (N = 10)
Rb4Heap: 6312993.5 i/s
Rb2Heap: 5515562.1 i/s - 1.14x slower
BSearch: 4258213.7 i/s - 1.48x slower
Timers gem: 4226971.0 i/s - 1.49x slower
pushpop w/ size=3,162,278
Rb4Heap: 369297.9 i/s
Rb2Heap: 337428.7 i/s - 1.09x slower
Timers gem: 255441.3 i/s - 1.45x slower
BSearch: 0.0 i/s
pushpop w/ size=1,000,000
Rb2Heap: 380666.3 i/s
Rb4Heap: 376737.1 i/s - 1.01x slower
Timers gem: 270840.2 i/s - 1.41x slower
BSearch: 0.0 i/s
pushpop w/ size= 316,228
Rb4Heap: 450095.2 i/s
Rb2Heap: 393731.8 i/s - 1.14x slower
Timers gem: 267331.3 i/s - 1.68x slower
BSearch: 31957.9 i/s - 14.08x slower
pushpop w/ size= 100,000
Rb4Heap: 459019.5 i/s
Rb2Heap: 434168.4 i/s - 1.06x slower
Timers gem: 290205.6 i/s - 1.58x slower
BSearch: 228682.2 i/s - 2.01x slower
pushpop w/ size= 31,623
BSearch: 956823.2 i/s
Rb4Heap: 507833.1 i/s - 1.88x slower
Rb2Heap: 456219.1 i/s - 2.10x slower
Timers gem: 310593.7 i/s - 3.08x slower
pushpop w/ size= 10,000
BSearch: 1527601.3 i/s
Rb4Heap: 570330.3 i/s - 2.68x slower
Rb2Heap: 499221.3 i/s - 3.06x slower
Timers gem: 312430.4 i/s - 4.89x slower
pushpop w/ size= 3,162
BSearch: 1789476.5 i/s
Rb4Heap: 665151.7 i/s - 2.69x slower
Rb2Heap: 567663.2 i/s - 3.15x slower
Timers gem: 366672.1 i/s - 4.88x slower
pushpop w/ size= 1,000
BSearch: 1983661.6 i/s
Rb4Heap: 785423.6 i/s - 2.53x slower
Rb2Heap: 645278.2 i/s - 3.07x slower
Timers gem: 447877.0 i/s - 4.43x slower
pushpop w/ size= 316
BSearch: 2074469.9 i/s
Rb4Heap: 938971.2 i/s - 2.21x slower
Rb2Heap: 724438.8 i/s - 2.86x slower
Timers gem: 486637.5 i/s - 4.26x slower
pushpop w/ size= 100
BSearch: 2566420.3 i/s
Rb4Heap: 949501.8 i/s - 2.70x slower
Rb2Heap: 926443.3 i/s - 2.77x slower
Timers gem: 636383.8 i/s - 4.03x slower
pushpop w/ size= 32
BSearch: 2775608.3 i/s
Rb4Heap: 1217661.6 i/s - 2.28x slower
Rb2Heap: 1103274.6 i/s - 2.52x slower
Timers gem: 763645.1 i/s - 3.63x slower
pushpop w/ size= 10
BSearch: 3613651.1 i/s
Rb4Heap: 1616519.3 i/s - 2.24x slower
Rb2Heap: 1604217.9 i/s - 2.25x slower
Timers gem: 1111453.2 i/s - 3.25x slower
push N, pop N (N=3,162,278)
Rb4Heap: 812798.4 i/s
Rb2Heap: 745647.4 i/s - 1.09x slower
Timers gem: 553322.0 i/s - 1.47x slower
BSearch: 0.0 i/s
push N, pop N (N=1,000,000)
Rb4Heap: 905588.3 i/s
Rb2Heap: 880822.2 i/s - 1.03x slower
Timers gem: 631874.9 i/s - 1.43x slower
BSearch: 0.0 i/s
push N, pop N (N= 316,228)
Rb4Heap: 1000895.0 i/s
Rb2Heap: 980327.1 i/s - 1.02x slower
Timers gem: 738141.9 i/s - 1.36x slower
BSearch: 0.0 i/s
push N, pop N (N= 100,000)
Rb4Heap: 1108310.9 i/s
Rb2Heap: 1095127.5 i/s - 1.01x slower
Timers gem: 834546.7 i/s - 1.33x slower
BSearch: 401824.9 i/s - 2.76x slower
push N, pop N (N= 31,623)
Rb4Heap: 1261109.5 i/s
Rb2Heap: 1226813.8 i/s - 1.03x slower
BSearch: 1070724.9 i/s - 1.18x slower
Timers gem: 904562.3 i/s - 1.39x slower
push N, pop N (N= 10,000)
BSearch: 2155975.5 i/s
Rb2Heap: 1390968.2 i/s - 1.55x slower
Rb4Heap: 1355336.7 i/s - 1.59x slower
Timers gem: 1025826.4 i/s - 2.10x slower
push N, pop N (N= 3,162)
BSearch: 2855840.8 i/s
Rb4Heap: 1565201.4 i/s - 1.82x slower
Rb2Heap: 1537136.9 i/s - 1.86x slower
Timers gem: 1153285.6 i/s - 2.48x slower
push N, pop N (N= 1,000)
BSearch: 3343737.3 i/s
Rb2Heap: 1756879.1 i/s - 1.90x slower
Rb4Heap: 1738745.9 i/s - 1.92x slower
Timers gem: 1314908.8 i/s - 2.54x slower
push N, pop N (N= 316)
BSearch: 3828548.6 i/s
Rb2Heap: 2015092.7 i/s - 1.90x slower
Rb4Heap: 1985609.1 i/s - 1.93x slower
Timers gem: 1555583.0 i/s - 2.46x slower
push N, pop N (N= 100)
BSearch: 4575311.2 i/s
Rb4Heap: 2443439.9 i/s - 1.87x slower
Rb2Heap: 2381603.2 i/s - 1.92x slower
Timers gem: 1904935.1 i/s - 2.40x slower
push N, pop N (N= 32)
BSearch: 5145040.1 i/s
Rb4Heap: 2990969.2 i/s - 1.72x slower
Rb2Heap: 2935640.4 i/s - 1.75x slower
Timers gem: 2401568.5 i/s - 2.14x slower
push N, pop N (N= 10)
BSearch: 6166773.3 i/s
Rb4Heap: 3922461.5 i/s - 1.57x slower
Rb2Heap: 3920965.2 i/s - 1.57x slower
Timers gem: 3305228.9 i/s - 1.87x slower
---
# run this file using the benchmark-driver gem. e.g:
#
# benchmark-driver ruby_priority_queues.yml
#
# To filter specific benchmarks, use --filter="pushpop"
#
# See "benchmark-driver --help" for more options.
contexts:
###################################################################
# Priority Queue implementations
###################################################################
- name: Timers gem
prelude: |
# Copyright, 2021, by Wander Hillen.
# Copyright, 2021, by Samuel G. D. Williams. <http://www.codeotaku.com>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.
# A priority queue implementation using a standard binary minheap. It uses
# straight comparison of its contents to determine priority. This works
# because a Handle from Timers::Events implements the '<' operation by
# comparing the expiry time. See <https://en.wikipedia.org/wiki/Binary_heap>
# for explanations of the main methods.
class TimersPriorityHeap
# The heap is represented with an array containing a binary tree. See
# https://en.wikipedia.org/wiki/Binary_heap#Heap_implementation for how this
# array is built up.
def initialize
@contents = []
end
# Returns the earliest timer or nil if the heap is empty.
def peek
@contents[0]
end
# Returns the number of elements in the heap
def size
@contents.size
end
def empty?
@contents.empty?
end
# Returns the earliest timer if the heap is non-empty and removes it from the heap.
# Returns nil if the heap is empty. (and doesn't change the heap in that case)
def pop
return nil if @contents.empty?
return @contents.pop if @contents.size == 1
value = @contents[0]
last = @contents.pop
@contents[0] = last
bubble_down(0)
value
end
# Inserts a new timer into the heap, then rearranges elements until the heap
# invariant is true again.
def <<(element)
@contents.push(element)
bubble_up(@contents.size - 1)
self
end
# Empties out the heap, discarding all elements
def clear
@contents = []
end
private
def bubble_up(index)
parent_index = (index - 1) / 2 # watch out, integer division!
while index > 0 && @contents[index] < @contents[parent_index]
# if the node has a smaller value than its parent, swap these nodes
# to uphold the minheap invariant and update the index of the 'current'
# node. If the node is already at index 0, we can also stop because that
# is the root of the heap.
# swap(index, parent_index)
@contents[index], @contents[parent_index] = @contents[parent_index], @contents[index]
index = parent_index
parent_index = (index - 1) / 2 # watch out, integer division!
end
end
def bubble_down(index)
swap_value = 0
swap_index = nil
while true
left_index = (2 * index) + 1
left_value = @contents[left_index]
if left_value.nil?
# This node has no children so it can't bubble down any further.
# We're done here!
return
end
# Determine which of the child nodes has the smallest value:
right_index = left_index + 1
right_value = @contents[right_index]
if right_value.nil? or right_value > left_value
swap_value = left_value
swap_index = left_index
else
swap_value = right_value
swap_index = right_index
end
if @contents[index] < swap_value
# No need to swap, the minheap invariant is already satisfied:
return
else
# At least one of the child node has a smaller value than the current
# node, swap current node with that child and update current node for
# if it might need to bubble down even further:
# swap(index, swap_index)
@contents[index], @contents[swap_index] = @contents[swap_index], @contents[index]
index = swap_index
end
end
end
end
D = 2
qklass = TimersPriorityHeap
q = TimersPriorityHeap.new
- name: BSearch
prelude: |
# A very simple example priority queue that is implemented with a sorted
# array.
#
# It uses Array#bsearch + Array#insert to push new values, and Array#pop
# to pop the min value.
class BSearch
attr_reader :a
# quick initialization by simply sorting the array once.
def initialize(count = nil)
@a = []
return unless count
count.times {|i| @a << yield(i) }
@a.sort!
end
def clear; @a.clear end
def empty?; @a.empty? end
def size; @a.size end
# Array#bsearch_index is O(log n)
# Array#insert is O(n)
#
# So this should be O(n).
#
# In practice though, memcpy has a *very* small constant factor.
def <<(score)
raise ArgumentError unless score
index = @a.bsearch_index {|other| score > other } || @a.length
@a.insert(index, score)
end
# Array#pop is O(1). It updates length without changing capacity or
# contents. No comparisons are necessary.
#
# shift is usually also O(1) and could be used if it were sorted
# normally.
def pop
@a.pop
end
end
D = 2
qklass = BSearch
q = BSearch.new
- name: Rb2Heap
prelude: |
# a very simple pure ruby binary heap
class Rb2Heap
attr_reader :a
# quick initialization by simply sorting the array once.
def initialize(count = nil)
@a = []
return unless count
count.times {|i| @a << yield(i) }
@a.sort!
end
def clear; @a.clear end
def empty?; @a.empty? end
def size; @a.size end
# On ruby 2.7.2 (no JIT):
# * Huge speedup from inlining everything inside the sift methods.
# * Inlined attr_accessor method calls too.
# * No speedup from manual unrolling of the loops.
# * Don't literally "swap" values. Only one assignment per layer.
#
# rubocop:disable Metrics/MethodLength, Metrics/AbcSize
def <<(value)
raise ArgumentError unless value
index = @a.size
while 0 < index # rubocop:disable Style/NumericPredicate
parent_index = (index - 1) / 2
parent_value = @a[parent_index]
break if parent_value <= value
@a[index] = parent_value
index = parent_index
end
@a[index] = value
self
end
def pop
return if @a.empty?
popped = @a.first
value = @a.pop
last_index = @a.size - 1
return popped unless 0 <= last_index
last_parent = (last_index - 1) / 2
index = 0
child_index = 1
while index <= last_parent
child_value = @a[child_index]
if child_index < last_index && @a[child_index + 1] < child_value
child_index += 1
child_value = @a[child_index]
end
break if value <= child_value
@a[index] = child_value
index = child_index
child_index = index * 2 + 1
end
@a[index] = value
popped
end
# rubocop:enable Metrics/MethodLength, Metrics/AbcSize
end
D = 2
qklass = Rb2Heap
q = Rb2Heap.new
- name: Rb4Heap
prelude: |
# a pure ruby quaternary heap.
#
# Unlike 4heaps written in low-level languages like C or go, this is only
# a tiny improvement over Rb2Heap. It's necessary to manually unroll the
# min-child-index loop, otherwise it's slower than Rb2Heap.
class Rb4Heap
attr_reader :a
# quick initialization by simply sorting the array once.
def initialize(count = nil)
@a = []
return unless count
count.times {|i| @a << yield(i) }
@a.sort!
end
def clear; @a.clear end
def empty?; @a.empty? end
def size; @a.size end
# rubocop:disable Metrics/CyclomaticComplexity, Metrics/BlockNesting, Metrics/PerceivedComplexity, Metrics/MethodLength, Metrics/AbcSize, Style/ParallelAssignment
def <<(value)
raise ArgumentError unless value
index = @a.size
while 0 < index # rubocop:disable Style/NumericPredicate
parent_index = (index - 1) / 4
parent_value = @a[parent_index]
break if parent_value <= value
@a[index] = parent_value
index = parent_index
end
@a[index] = value
self
end
def pop
return if @a.empty?
popped = @a.first
swap_val = @a.pop
last_idx = @a.size - 1
return popped unless 0 <= last_idx
last_parent = (last_idx - 1) / 4
index = 0
while index <= last_parent # first child
# find minimum child, idx and val
#
# n.b. in my benchmarks, With a while loop to find the min child,
# Rb2Heap is faster than Rb4Heap
# With the loop unrolled, Rb4Heap is faster.
min_idx = index * 4 + 1
min_val = @a[min_idx]
if min_idx + 3 <= last_idx
cmp_idx = min_idx + 1 # second child
cmp_val = @a[cmp_idx]
min_idx, min_val = cmp_idx, cmp_val if cmp_val < min_val
cmp_idx += 1 # third child
cmp_val = @a[cmp_idx]
min_idx, min_val = cmp_idx, cmp_val if cmp_val < min_val
cmp_idx += 1 # fourth child
cmp_val = @a[cmp_idx]
min_idx, min_val = cmp_idx, cmp_val if cmp_val < min_val
elsif min_idx < last_idx # second child
cmp_idx = min_idx + 1
cmp_val = @a[cmp_idx]
min_idx, min_val = cmp_idx, cmp_val if cmp_val < min_val
if cmp_idx < last_idx # third child
cmp_idx += 1
cmp_val = @a[cmp_idx]
min_idx, min_val = cmp_idx, cmp_val if cmp_val < min_val
if cmp_idx < last_idx # fourth child
cmp_idx += 1
cmp_val = @a[cmp_idx]
min_idx, min_val = cmp_idx, cmp_val if cmp_val < min_val
end
end
end
break if swap_val <= min_val
@a[index] = min_val
index = min_idx
min_idx = index * 4 + 1
end
@a[index] = swap_val
popped
end
# rubocop:enable Metrics/CyclomaticComplexity, Metrics/BlockNesting, Metrics/PerceivedComplexity, Metrics/MethodLength, Metrics/AbcSize, Style/ParallelAssignment
end
D = 4
qklass = Rb4Heap
q = Rb4Heap.new
prelude: |
############################################################################
# Create an array of random values.
#
# This is more than 4x faster than calling Kernel#rand inside the benchmarks.
############################################################################
random_idx = -1
random_len = 3_162_278
random_vals = Array.new(random_len) { rand(0..100_000_000) }
############################################################################
# Benchmark parameters
############################################################################
i = j = 0
############################################################################
# for pushpop
############################################################################
pushpop_i = 1
low_n = nil
high_n = nil
prefill = -> (qklass, n) {
raise "D is too big for that N" if n - D/2 < 0
low_n = n - D/2
high_n = n + (D-1)/2
raise "bad N range: (#{low_n}...#{high_n})" unless high_n - low_n == D - 1
if qklass.name == "BSearch"
qklass.new(n) {
random_idx += 1
random_vals.fetch(random_idx % random_len)
}
else
q = qklass.new
n.times do
random_idx += 1
q << random_vals.fetch(random_idx % random_len)
end
q
end
}
# in order to avoid any oddities that might come from pushing and popping to a
# heap that is always the exact same size, +/-1, we'll introduce some
# non-random "jitter", by growing or shrinking the heap every so often.
# how many reps between each "jitter" size adjustment
REPS_PER_SIZE_ADJUSTMENT = 100_000
# Because sift-up is simple and very fast, and sift-down is where most of the
# performance gains are made, we'll sift-down (i.e. pop) one at a time, but
# sift-up (i.e. push) all at once. We'll go from (N+D/2) down to (N-D/2)
teardown: |
if defined?(N) && defined?(N2) # in the push_n_then_pop_n bms
puts "teardown: size: %p, N: %p, i: %p, j: %p, __bmdv_i: %p" % [
q.size, N, i, j, __bmdv_i
]
unless q.empty? && N == i && N == j && __bmdv_i % N == 0
raise "N x 2 (%d) must be a multiple of loop_count %d" % [N2, __bmdv_i]
end
end
benchmark:
# for "push N", loop_count must be a multiple of N
- name: push N (N = 3,162,278)
prelude: |
exit 1 if qklass.name == "BSearch" # bsearch will take forever...
N = 3_162_278
loop_count: 6_324_556 # must be a multiple of N
script: &push_script |
q.clear if __bmdv_i % N == 0
random_idx += 1
q << random_vals.fetch(random_idx % random_len)
- name: push N (N = 1,000,000)
prelude: |
exit 1 if qklass.name == "BSearch" # bsearch will take forever...
N = 1_000_000
loop_count: 6_000_000 # must be a multiple of N
script: *push_script
- name: push N (N = 316,228)
prelude: |
exit 1 if qklass.name == "BSearch" # toooo slooooow
N = 316_228
loop_count: 6_324_560 # must be a multiple of N
script: *push_script
- name: push N (N = 100,000)
prelude: N = 100_000
loop_count: 6_000_000 # must be a multiple of N
script: *push_script
- name: push N (N = 31,623)
prelude: N = 31_623
loop_count: 6_324_600 # must be a multiple of N
script: *push_script
- name: push N (N = 10,000)
prelude: N = 10_000
loop_count: 6_000_000 # must be a multiple of N
script: *push_script
- name: push N (N = 3,162)
prelude: N = 3162
loop_count: 6_324_000 # must be a multiple of N
script: *push_script
- name: push N (N = 1,000)
prelude: N = 1_000
loop_count: 6_000_000 # must be a multiple of N
script: *push_script
- name: push N (N = 316)
prelude: N = 316
loop_count: 6_320_000 # must be a multiple of N
script: *push_script
- name: push N (N = 100)
prelude: N = 100
loop_count: 6_000_000 # must be a multiple of N
script: *push_script
- name: push N (N = 32)
prelude: N = 32
loop_count: 6_400_000 # must be a multiple of N
script: *push_script
- name: push N (N = 10)
prelude: N = 10
loop_count: 6_000_000 # must be a multiple of N
script: *push_script
# no loop_count restrictions for pushpop.
- name: pushpop w/ size=3,162,278
prelude: |
exit 1 if qklass.name == "BSearch"
q = prefill.call qklass, 3_162_278
script: &pushpop_script |
random_idx += 1
q << random_vals.fetch(random_idx % random_len)
q.pop
if pushpop_i % REPS_PER_SIZE_ADJUSTMENT == 0
if q.size == low_n
D.times { q << random_vals.fetch(random_idx % random_len) }
else
q.pop
end
end
pushpop_i += 1
- name: pushpop w/ size=1,000,000
prelude: |
exit 1 if qklass.name == "BSearch"
q = prefill.call qklass, 1_000_000
script: *pushpop_script
- name: pushpop w/ size= 316,228
prelude: "q = prefill.call qklass, 316_228"
script: *pushpop_script
- name: pushpop w/ size= 100,000
prelude: "q = prefill.call qklass, 100_000"
script: *pushpop_script
- name: pushpop w/ size= 31,623
prelude: "q = prefill.call qklass, 31_623"
script: *pushpop_script
- name: pushpop w/ size= 10,000
prelude: "q = prefill.call qklass, 10_000"
script: *pushpop_script
- name: pushpop w/ size= 3,162
prelude: "q = prefill.call qklass, 3_162"
script: *pushpop_script
- name: pushpop w/ size= 1,000
prelude: "q = prefill.call qklass, 1_000"
script: *pushpop_script
- name: pushpop w/ size= 316
prelude: "q = prefill.call qklass, 316"
script: *pushpop_script
- name: pushpop w/ size= 100
prelude: "q = prefill.call qklass, 100"
script: *pushpop_script
- name: pushpop w/ size= 32
prelude: "q = prefill.call qklass, 32"
script: *pushpop_script
- name: pushpop w/ size= 10
prelude: "q = prefill.call qklass, 10"
script: *pushpop_script
# N2, i, and j are used to control push vs pop.
#
# This specific benchmark is only accurate when the loop_count exactly match a
# multiple of N2. Doing it like this (instead of by pushing and popping all N
# items each time through the loop) allows a more intuitive apples-to-apples
# comparison with the other benchmarks. It's basically measuring the
# *average* push or pop time.
#
# N values were chosen to approximate sqrt(10), so every *two* benchmarks
# grows/shrinkgs by 10x.
- name: push N, pop N (N=3,162,278)
prelude: |
exit 1 if qklass.name == "BSearch"
N = 3_162_278; N2 = N * 2
loop_count: 6_324_556 # must be a multiple of N * 2
script: &push_n_then_pop_n |
if i < N
q.clear if __bmdv_i == 0
random_idx += 1
q << random_vals.fetch(random_idx % random_len)
i += 1
elsif j < N
q.pop
j += 1
elsif q.empty?
j = 0
random_idx += 1
q << random_vals.fetch(random_idx % random_len)
i = 1
else
raise "q not empty! size: %p, N: %p, i: %p, j: %p, __bmdv_i: %p" % [
q.size, N, i, j, __bmdv_i
]
end
- name: push N, pop N (N=1,000,000)
prelude: |
exit 1 if qklass.name == "BSearch"
N = 1_000_000; N2 = N * 2
loop_count: 6_000_000 # must be a multiple of N * 2
script: *push_n_then_pop_n
- name: push N, pop N (N= 316,228)
prelude: |
exit 1 if qklass.name == "BSearch"
N = 316_228; N2 = N * 2
loop_count: 6_324_560 # must be a multiple of N * 2
script: *push_n_then_pop_n
- name: push N, pop N (N= 100,000)
prelude: |
N = 100_000; N2 = N * 2
loop_count: 6_000_000 # must be a multiple of N * 2
script: *push_n_then_pop_n
- name: push N, pop N (N= 31,623)
prelude: |
N = 31_623; N2 = N * 2
loop_count: 6_324_600 # must be a multiple of N * 2
script: *push_n_then_pop_n
- name: push N, pop N (N= 10,000)
prelude: |
N = 10_000; N2 = N * 2
loop_count: 6_000_000 # must be a multiple of N * 2
script: *push_n_then_pop_n
- name: push N, pop N (N= 3,162)
prelude: |
N = 3162; N2 = N * 2
loop_count: 6_324_000 # must be a multiple of N * 2
script: *push_n_then_pop_n
- name: push N, pop N (N= 1,000)
prelude: |
N = 1_000; N2 = N * 2
loop_count: 6_000_000 # must be a multiple of N * 2
script: *push_n_then_pop_n
- name: push N, pop N (N= 316)
prelude: |
N = 316; N2 = N * 2
loop_count: 6_320_000 # must be a multiple of N * 2
script: *push_n_then_pop_n
- name: push N, pop N (N= 100)
prelude: |
N = 100; N2 = N * 2
loop_count: 6_000_000 # must be a multiple of N * 2
script: *push_n_then_pop_n
- name: push N, pop N (N= 32)
prelude: |
N = 32; N2 = N * 2
loop_count: 6_400_000 # must be a multiple of N * 2
script: *push_n_then_pop_n
- name: push N, pop N (N= 10)
prelude: |
N = 10; N2 = N * 2
loop_count: 6_000_000 # must be a multiple of N * 2
script: *push_n_then_pop_n
@nevans
Copy link
Author

nevans commented Mar 24, 2021

I quickly extracted these from the timers gem and from my own benchmarks in my d_heap gem. Rb2Heap and Rb4Heap are my best attempts so far at optimizing a pure ruby heap (tested under ruby 2.7.2 w/o JIT). There are rspec tests for all implementations in their respective source gems.

The BSearch algorithm is always at least a little bit slower for push (using Array#bsearch_index + Array#insert), but it always much much faster for pop (using Array#pop). So certain applications--e.g. graphing algorithms which push more items than they pop--will always benefit from a proper heap. But if you expect your heap size to be under 10k items and will eventually pop every item you push, then the pure ruby heap will probably be slower than bsearch_index + index. Of course, if you are willing to use C (or Java) extensions then you can use d_heap, which is much faster than any of these. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment