Last active
March 24, 2021 17:27
-
-
Save nevans/6473e9fe1bf95930c5382d86db863b5b to your computer and use it in GitHub Desktop.
Benchmarks for some pure ruby priority queue implementations
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Run on an Intel(R) Core(TM) i7-1065G7 | |
# using ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-linux] | |
# n.b. when running these benchmarks, I did not shut down other | |
# processes (including web browser) nor did I run the benchmark | |
# multiple times (using "--repeat-count"). The numbers would be | |
# slightly faster and more stable/reliable if I'd done that. | |
# Maybe (maybe) I'll re-run them later. | |
Warming up -------------------------------------- | |
pushpop w/ size=3,162,278 256.112k i/s - 263.417k times in 1.028524s (3.90μs/i) | |
pushpop w/ size=1,000,000 283.858k i/s - 293.040k times in 1.032346s (3.52μs/i) | |
pushpop w/ size= 316,228 286.535k i/s - 316.712k times in 1.105318s (3.49μs/i) | |
pushpop w/ size= 100,000 291.138k i/s - 294.530k times in 1.011649s (3.43μs/i) | |
pushpop w/ size= 31,623 314.217k i/s - 335.852k times in 1.068854s (3.18μs/i) | |
pushpop w/ size= 10,000 326.015k i/s - 333.608k times in 1.023290s (3.07μs/i) | |
pushpop w/ size= 3,162 381.708k i/s - 405.600k times in 1.062592s (2.62μs/i) | |
pushpop w/ size= 1,000 441.730k i/s - 458.604k times in 1.038199s (2.26μs/i) | |
pushpop w/ size= 316 459.527k i/s - 477.636k times in 1.039407s (2.18μs/i) | |
pushpop w/ size= 100 622.711k i/s - 672.425k times in 1.079835s (1.61μs/i) | |
pushpop w/ size= 32 751.908k i/s - 774.423k times in 1.029944s (1.33μs/i) | |
pushpop w/ size= 10 1.085M i/s - 1.147M times in 1.057044s (921.95ns/i) | |
Calculating ------------------------------------- | |
Timers gem BSearch Rb2Heap Rb4Heap | |
push N (N = 3,162,278) 3.416M ERROR 3.873M 5.206M i/s - 6.325M times in 1.851497s 0.000000s 1.633142s 1.214862s | |
push N (N = 1,000,000) 3.411M ERROR 4.288M 5.170M i/s - 6.000M times in 1.759272s 0.000000s 1.399344s 1.160469s | |
push N (N = 316,228) 2.981M ERROR 4.277M 5.402M i/s - 6.325M times in 2.121835s 0.000000s 1.478898s 1.170712s | |
push N (N = 100,000) 3.416M 200.902k 4.313M 5.529M i/s - 6.000M times in 1.756650s 29.865273s 1.391108s 1.085232s | |
push N (N = 31,623) 3.462M 563.896k 4.417M 5.863M i/s - 6.325M times in 1.826679s 11.215906s 1.431938s 1.078645s | |
push N (N = 10,000) 3.346M 1.117M 4.225M 5.560M i/s - 6.000M times in 1.793187s 5.370607s 1.420072s 1.079041s | |
push N (N = 3,162) 3.432M 1.588M 4.463M 5.997M i/s - 6.324M times in 1.842489s 3.982931s 1.417132s 1.054586s | |
push N (N = 1,000) 3.423M 1.775M 4.376M 5.855M i/s - 6.000M times in 1.752874s 3.379436s 1.371119s 1.024803s | |
push N (N = 316) 3.519M 2.236M 4.327M 5.863M i/s - 6.320M times in 1.795894s 2.826953s 1.460678s 1.077881s | |
push N (N = 100) 3.560M 2.635M 4.623M 5.806M i/s - 6.000M times in 1.685267s 2.276668s 1.297843s 1.033357s | |
push N (N = 32) 3.923M 3.372M 4.959M 5.938M i/s - 6.400M times in 1.631234s 1.898139s 1.290469s 1.077737s | |
push N (N = 10) 4.227M 4.258M 5.516M 6.313M i/s - 6.000M times in 1.419456s 1.409042s 1.087831s 0.950421s | |
pushpop w/ size=3,162,278 255.441k ERROR 337.429k 369.298k i/s - 768.334k times in 3.007869s 0.000000s 2.277026s 2.080526s | |
pushpop w/ size=1,000,000 270.840k ERROR 380.666k 376.737k i/s - 851.574k times in 3.144193s 0.000000s 2.237061s 2.260393s | |
pushpop w/ size= 316,228 267.331k 31.958k 393.732k 450.095k i/s - 859.604k times in 3.215501s 26.898024s 2.183222s 1.909827s | |
pushpop w/ size= 100,000 290.206k 228.682k 434.168k 459.020k i/s - 873.415k times in 3.009642s 3.819340s 2.011696s 1.902784s | |
pushpop w/ size= 31,623 310.594k 956.823k 456.219k 507.833k i/s - 942.650k times in 3.034994s 0.985187s 2.066222s 1.856220s | |
pushpop w/ size= 10,000 312.430k 1.528M 499.221k 570.330k i/s - 978.044k times in 3.130438s 0.640248s 1.959139s 1.714873s | |
pushpop w/ size= 3,162 366.672k 1.789M 567.663k 665.152k i/s - 1.145M times in 3.123019s 0.639921s 2.017259s 1.721598s | |
pushpop w/ size= 1,000 447.877k 1.984M 645.278k 785.424k i/s - 1.325M times in 2.958826s 0.668052s 2.053672s 1.687230s | |
pushpop w/ size= 316 486.638k 2.074M 724.439k 938.971k i/s - 1.379M times in 2.832870s 0.664546s 1.902964s 1.468182s | |
pushpop w/ size= 100 636.384k 2.566M 926.443k 949.502k i/s - 1.868M times in 2.935543s 0.727914s 2.016456s 1.967486s | |
pushpop w/ size= 32 763.645k 2.776M 1.103M 1.218M i/s - 2.256M times in 2.953889s 0.812695s 2.044571s 1.852504s | |
pushpop w/ size= 10 1.111M 3.614M 1.604M 1.617M i/s - 3.254M times in 2.927673s 0.900466s 2.028385s 2.012949s | |
push N, pop N (N=3,162,278) 553.322k ERROR 745.647k 812.798k i/s - 6.325M times in 11.430155s 0.000000s 8.481966s 7.781211s | |
push N, pop N (N=1,000,000) 631.875k ERROR 880.822k 905.588k i/s - 6.000M times in 9.495550s 0.000000s 6.811817s 6.625528s | |
push N, pop N (N= 316,228) 738.142k ERROR 980.327k 1.001M i/s - 6.325M times in 8.568217s 0.000000s 6.451480s 6.318905s | |
push N, pop N (N= 100,000) 834.547k 401.825k 1.095M 1.108M i/s - 6.000M times in 7.189532s 14.931876s 5.478814s 5.413644s | |
push N, pop N (N= 31,623) 904.562k 1.071M 1.227M 1.261M i/s - 6.325M times in 6.991890s 5.906839s 5.155305s 5.015108s | |
push N, pop N (N= 10,000) 1.026M 2.156M 1.391M 1.355M i/s - 6.000M times in 5.848943s 2.782963s 4.313542s 4.426944s | |
push N, pop N (N= 3,162) 1.153M 2.856M 1.537M 1.565M i/s - 6.324M times in 5.483464s 2.214409s 4.114142s 4.040375s | |
push N, pop N (N= 1,000) 1.315M 3.344M 1.757M 1.739M i/s - 6.000M times in 4.563054s 1.794399s 3.415147s 3.450763s | |
push N, pop N (N= 316) 1.556M 3.829M 2.015M 1.986M i/s - 6.320M times in 4.062785s 1.650756s 3.136332s 3.182902s | |
push N, pop N (N= 100) 1.905M 4.575M 2.382M 2.443M i/s - 6.000M times in 3.149714s 1.311386s 2.519311s 2.455555s | |
push N, pop N (N= 32) 2.402M 5.145M 2.936M 2.991M i/s - 6.400M times in 2.664925s 1.243916s 2.180104s 2.139775s | |
push N, pop N (N= 10) 3.305M 6.167M 3.921M 3.922M i/s - 6.000M times in 1.815305s 0.972956s 1.530235s 1.529652s | |
Comparison: | |
push N (N = 3,162,278) | |
Rb4Heap: 5205985.6 i/s | |
Rb2Heap: 3872629.7 i/s - 1.34x slower | |
Timers gem: 3415913.9 i/s - 1.52x slower | |
BSearch: 0.0 i/s | |
push N (N = 1,000,000) | |
Rb4Heap: 5170325.4 i/s | |
Rb2Heap: 4287723.7 i/s - 1.21x slower | |
Timers gem: 3410501.9 i/s - 1.52x slower | |
BSearch: 0.0 i/s | |
push N (N = 316,228) | |
Rb4Heap: 5402320.5 i/s | |
Rb2Heap: 4276535.2 i/s - 1.26x slower | |
Timers gem: 2980703.3 i/s - 1.81x slower | |
BSearch: 0.0 i/s | |
push N (N = 100,000) | |
Rb4Heap: 5528770.6 i/s | |
Rb2Heap: 4313109.2 i/s - 1.28x slower | |
Timers gem: 3415591.6 i/s - 1.62x slower | |
BSearch: 200902.2 i/s - 27.52x slower | |
push N (N = 31,623) | |
Rb4Heap: 5863469.5 i/s | |
Rb2Heap: 4416811.1 i/s - 1.33x slower | |
Timers gem: 3462349.2 i/s - 1.69x slower | |
BSearch: 563895.6 i/s - 10.40x slower | |
push N (N = 10,000) | |
Rb4Heap: 5560494.7 i/s | |
Rb2Heap: 4225137.8 i/s - 1.32x slower | |
Timers gem: 3345997.2 i/s - 1.66x slower | |
BSearch: 1117192.2 i/s - 4.98x slower | |
push N (N = 3,162) | |
Rb4Heap: 5996667.1 i/s | |
Rb2Heap: 4462533.0 i/s - 1.34x slower | |
Timers gem: 3432313.5 i/s - 1.75x slower | |
BSearch: 1587775.5 i/s - 3.78x slower | |
push N (N = 1,000) | |
Rb4Heap: 5854782.3 i/s | |
Rb2Heap: 4375989.0 i/s - 1.34x slower | |
Timers gem: 3422949.9 i/s - 1.71x slower | |
BSearch: 1775444.3 i/s - 3.30x slower | |
push N (N = 316) | |
Rb4Heap: 5863357.7 i/s | |
Rb2Heap: 4326757.3 i/s - 1.36x slower | |
Timers gem: 3519138.1 i/s - 1.67x slower | |
BSearch: 2235622.4 i/s - 2.62x slower | |
push N (N = 100) | |
Rb4Heap: 5806319.8 i/s | |
Rb2Heap: 4623056.5 i/s - 1.26x slower | |
Timers gem: 3560267.2 i/s - 1.63x slower | |
BSearch: 2635430.5 i/s - 2.20x slower | |
push N (N = 32) | |
Rb4Heap: 5938370.7 i/s | |
Rb2Heap: 4959436.4 i/s - 1.20x slower | |
Timers gem: 3923409.4 i/s - 1.51x slower | |
BSearch: 3371723.6 i/s - 1.76x slower | |
push N (N = 10) | |
Rb4Heap: 6312993.5 i/s | |
Rb2Heap: 5515562.1 i/s - 1.14x slower | |
BSearch: 4258213.7 i/s - 1.48x slower | |
Timers gem: 4226971.0 i/s - 1.49x slower | |
pushpop w/ size=3,162,278 | |
Rb4Heap: 369297.9 i/s | |
Rb2Heap: 337428.7 i/s - 1.09x slower | |
Timers gem: 255441.3 i/s - 1.45x slower | |
BSearch: 0.0 i/s | |
pushpop w/ size=1,000,000 | |
Rb2Heap: 380666.3 i/s | |
Rb4Heap: 376737.1 i/s - 1.01x slower | |
Timers gem: 270840.2 i/s - 1.41x slower | |
BSearch: 0.0 i/s | |
pushpop w/ size= 316,228 | |
Rb4Heap: 450095.2 i/s | |
Rb2Heap: 393731.8 i/s - 1.14x slower | |
Timers gem: 267331.3 i/s - 1.68x slower | |
BSearch: 31957.9 i/s - 14.08x slower | |
pushpop w/ size= 100,000 | |
Rb4Heap: 459019.5 i/s | |
Rb2Heap: 434168.4 i/s - 1.06x slower | |
Timers gem: 290205.6 i/s - 1.58x slower | |
BSearch: 228682.2 i/s - 2.01x slower | |
pushpop w/ size= 31,623 | |
BSearch: 956823.2 i/s | |
Rb4Heap: 507833.1 i/s - 1.88x slower | |
Rb2Heap: 456219.1 i/s - 2.10x slower | |
Timers gem: 310593.7 i/s - 3.08x slower | |
pushpop w/ size= 10,000 | |
BSearch: 1527601.3 i/s | |
Rb4Heap: 570330.3 i/s - 2.68x slower | |
Rb2Heap: 499221.3 i/s - 3.06x slower | |
Timers gem: 312430.4 i/s - 4.89x slower | |
pushpop w/ size= 3,162 | |
BSearch: 1789476.5 i/s | |
Rb4Heap: 665151.7 i/s - 2.69x slower | |
Rb2Heap: 567663.2 i/s - 3.15x slower | |
Timers gem: 366672.1 i/s - 4.88x slower | |
pushpop w/ size= 1,000 | |
BSearch: 1983661.6 i/s | |
Rb4Heap: 785423.6 i/s - 2.53x slower | |
Rb2Heap: 645278.2 i/s - 3.07x slower | |
Timers gem: 447877.0 i/s - 4.43x slower | |
pushpop w/ size= 316 | |
BSearch: 2074469.9 i/s | |
Rb4Heap: 938971.2 i/s - 2.21x slower | |
Rb2Heap: 724438.8 i/s - 2.86x slower | |
Timers gem: 486637.5 i/s - 4.26x slower | |
pushpop w/ size= 100 | |
BSearch: 2566420.3 i/s | |
Rb4Heap: 949501.8 i/s - 2.70x slower | |
Rb2Heap: 926443.3 i/s - 2.77x slower | |
Timers gem: 636383.8 i/s - 4.03x slower | |
pushpop w/ size= 32 | |
BSearch: 2775608.3 i/s | |
Rb4Heap: 1217661.6 i/s - 2.28x slower | |
Rb2Heap: 1103274.6 i/s - 2.52x slower | |
Timers gem: 763645.1 i/s - 3.63x slower | |
pushpop w/ size= 10 | |
BSearch: 3613651.1 i/s | |
Rb4Heap: 1616519.3 i/s - 2.24x slower | |
Rb2Heap: 1604217.9 i/s - 2.25x slower | |
Timers gem: 1111453.2 i/s - 3.25x slower | |
push N, pop N (N=3,162,278) | |
Rb4Heap: 812798.4 i/s | |
Rb2Heap: 745647.4 i/s - 1.09x slower | |
Timers gem: 553322.0 i/s - 1.47x slower | |
BSearch: 0.0 i/s | |
push N, pop N (N=1,000,000) | |
Rb4Heap: 905588.3 i/s | |
Rb2Heap: 880822.2 i/s - 1.03x slower | |
Timers gem: 631874.9 i/s - 1.43x slower | |
BSearch: 0.0 i/s | |
push N, pop N (N= 316,228) | |
Rb4Heap: 1000895.0 i/s | |
Rb2Heap: 980327.1 i/s - 1.02x slower | |
Timers gem: 738141.9 i/s - 1.36x slower | |
BSearch: 0.0 i/s | |
push N, pop N (N= 100,000) | |
Rb4Heap: 1108310.9 i/s | |
Rb2Heap: 1095127.5 i/s - 1.01x slower | |
Timers gem: 834546.7 i/s - 1.33x slower | |
BSearch: 401824.9 i/s - 2.76x slower | |
push N, pop N (N= 31,623) | |
Rb4Heap: 1261109.5 i/s | |
Rb2Heap: 1226813.8 i/s - 1.03x slower | |
BSearch: 1070724.9 i/s - 1.18x slower | |
Timers gem: 904562.3 i/s - 1.39x slower | |
push N, pop N (N= 10,000) | |
BSearch: 2155975.5 i/s | |
Rb2Heap: 1390968.2 i/s - 1.55x slower | |
Rb4Heap: 1355336.7 i/s - 1.59x slower | |
Timers gem: 1025826.4 i/s - 2.10x slower | |
push N, pop N (N= 3,162) | |
BSearch: 2855840.8 i/s | |
Rb4Heap: 1565201.4 i/s - 1.82x slower | |
Rb2Heap: 1537136.9 i/s - 1.86x slower | |
Timers gem: 1153285.6 i/s - 2.48x slower | |
push N, pop N (N= 1,000) | |
BSearch: 3343737.3 i/s | |
Rb2Heap: 1756879.1 i/s - 1.90x slower | |
Rb4Heap: 1738745.9 i/s - 1.92x slower | |
Timers gem: 1314908.8 i/s - 2.54x slower | |
push N, pop N (N= 316) | |
BSearch: 3828548.6 i/s | |
Rb2Heap: 2015092.7 i/s - 1.90x slower | |
Rb4Heap: 1985609.1 i/s - 1.93x slower | |
Timers gem: 1555583.0 i/s - 2.46x slower | |
push N, pop N (N= 100) | |
BSearch: 4575311.2 i/s | |
Rb4Heap: 2443439.9 i/s - 1.87x slower | |
Rb2Heap: 2381603.2 i/s - 1.92x slower | |
Timers gem: 1904935.1 i/s - 2.40x slower | |
push N, pop N (N= 32) | |
BSearch: 5145040.1 i/s | |
Rb4Heap: 2990969.2 i/s - 1.72x slower | |
Rb2Heap: 2935640.4 i/s - 1.75x slower | |
Timers gem: 2401568.5 i/s - 2.14x slower | |
push N, pop N (N= 10) | |
BSearch: 6166773.3 i/s | |
Rb4Heap: 3922461.5 i/s - 1.57x slower | |
Rb2Heap: 3920965.2 i/s - 1.57x slower | |
Timers gem: 3305228.9 i/s - 1.87x slower |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
# run this file using the benchmark-driver gem. e.g: | |
# | |
# benchmark-driver ruby_priority_queues.yml | |
# | |
# To filter specific benchmarks, use --filter="pushpop" | |
# | |
# See "benchmark-driver --help" for more options. | |
contexts: | |
################################################################### | |
# Priority Queue implementations | |
################################################################### | |
- name: Timers gem | |
prelude: | | |
# Copyright, 2021, by Wander Hillen. | |
# Copyright, 2021, by Samuel G. D. Williams. <http://www.codeotaku.com> | |
# | |
# Permission is hereby granted, free of charge, to any person obtaining a copy | |
# of this software and associated documentation files (the "Software"), to deal | |
# in the Software without restriction, including without limitation the rights | |
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | |
# copies of the Software, and to permit persons to whom the Software is | |
# furnished to do so, subject to the following conditions: | |
# | |
# The above copyright notice and this permission notice shall be included in | |
# all copies or substantial portions of the Software. | |
# | |
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | |
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | |
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | |
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | |
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | |
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN | |
# THE SOFTWARE. | |
# A priority queue implementation using a standard binary minheap. It uses | |
# straight comparison of its contents to determine priority. This works | |
# because a Handle from Timers::Events implements the '<' operation by | |
# comparing the expiry time. See <https://en.wikipedia.org/wiki/Binary_heap> | |
# for explanations of the main methods. | |
class TimersPriorityHeap | |
# The heap is represented with an array containing a binary tree. See | |
# https://en.wikipedia.org/wiki/Binary_heap#Heap_implementation for how this | |
# array is built up. | |
def initialize | |
@contents = [] | |
end | |
# Returns the earliest timer or nil if the heap is empty. | |
def peek | |
@contents[0] | |
end | |
# Returns the number of elements in the heap | |
def size | |
@contents.size | |
end | |
def empty? | |
@contents.empty? | |
end | |
# Returns the earliest timer if the heap is non-empty and removes it from the heap. | |
# Returns nil if the heap is empty. (and doesn't change the heap in that case) | |
def pop | |
return nil if @contents.empty? | |
return @contents.pop if @contents.size == 1 | |
value = @contents[0] | |
last = @contents.pop | |
@contents[0] = last | |
bubble_down(0) | |
value | |
end | |
# Inserts a new timer into the heap, then rearranges elements until the heap | |
# invariant is true again. | |
def <<(element) | |
@contents.push(element) | |
bubble_up(@contents.size - 1) | |
self | |
end | |
# Empties out the heap, discarding all elements | |
def clear | |
@contents = [] | |
end | |
private | |
def bubble_up(index) | |
parent_index = (index - 1) / 2 # watch out, integer division! | |
while index > 0 && @contents[index] < @contents[parent_index] | |
# if the node has a smaller value than its parent, swap these nodes | |
# to uphold the minheap invariant and update the index of the 'current' | |
# node. If the node is already at index 0, we can also stop because that | |
# is the root of the heap. | |
# swap(index, parent_index) | |
@contents[index], @contents[parent_index] = @contents[parent_index], @contents[index] | |
index = parent_index | |
parent_index = (index - 1) / 2 # watch out, integer division! | |
end | |
end | |
def bubble_down(index) | |
swap_value = 0 | |
swap_index = nil | |
while true | |
left_index = (2 * index) + 1 | |
left_value = @contents[left_index] | |
if left_value.nil? | |
# This node has no children so it can't bubble down any further. | |
# We're done here! | |
return | |
end | |
# Determine which of the child nodes has the smallest value: | |
right_index = left_index + 1 | |
right_value = @contents[right_index] | |
if right_value.nil? or right_value > left_value | |
swap_value = left_value | |
swap_index = left_index | |
else | |
swap_value = right_value | |
swap_index = right_index | |
end | |
if @contents[index] < swap_value | |
# No need to swap, the minheap invariant is already satisfied: | |
return | |
else | |
# At least one of the child node has a smaller value than the current | |
# node, swap current node with that child and update current node for | |
# if it might need to bubble down even further: | |
# swap(index, swap_index) | |
@contents[index], @contents[swap_index] = @contents[swap_index], @contents[index] | |
index = swap_index | |
end | |
end | |
end | |
end | |
D = 2 | |
qklass = TimersPriorityHeap | |
q = TimersPriorityHeap.new | |
- name: BSearch | |
prelude: | | |
# A very simple example priority queue that is implemented with a sorted | |
# array. | |
# | |
# It uses Array#bsearch + Array#insert to push new values, and Array#pop | |
# to pop the min value. | |
class BSearch | |
attr_reader :a | |
# quick initialization by simply sorting the array once. | |
def initialize(count = nil) | |
@a = [] | |
return unless count | |
count.times {|i| @a << yield(i) } | |
@a.sort! | |
end | |
def clear; @a.clear end | |
def empty?; @a.empty? end | |
def size; @a.size end | |
# Array#bsearch_index is O(log n) | |
# Array#insert is O(n) | |
# | |
# So this should be O(n). | |
# | |
# In practice though, memcpy has a *very* small constant factor. | |
def <<(score) | |
raise ArgumentError unless score | |
index = @a.bsearch_index {|other| score > other } || @a.length | |
@a.insert(index, score) | |
end | |
# Array#pop is O(1). It updates length without changing capacity or | |
# contents. No comparisons are necessary. | |
# | |
# shift is usually also O(1) and could be used if it were sorted | |
# normally. | |
def pop | |
@a.pop | |
end | |
end | |
D = 2 | |
qklass = BSearch | |
q = BSearch.new | |
- name: Rb2Heap | |
prelude: | | |
# a very simple pure ruby binary heap | |
class Rb2Heap | |
attr_reader :a | |
# quick initialization by simply sorting the array once. | |
def initialize(count = nil) | |
@a = [] | |
return unless count | |
count.times {|i| @a << yield(i) } | |
@a.sort! | |
end | |
def clear; @a.clear end | |
def empty?; @a.empty? end | |
def size; @a.size end | |
# On ruby 2.7.2 (no JIT): | |
# * Huge speedup from inlining everything inside the sift methods. | |
# * Inlined attr_accessor method calls too. | |
# * No speedup from manual unrolling of the loops. | |
# * Don't literally "swap" values. Only one assignment per layer. | |
# | |
# rubocop:disable Metrics/MethodLength, Metrics/AbcSize | |
def <<(value) | |
raise ArgumentError unless value | |
index = @a.size | |
while 0 < index # rubocop:disable Style/NumericPredicate | |
parent_index = (index - 1) / 2 | |
parent_value = @a[parent_index] | |
break if parent_value <= value | |
@a[index] = parent_value | |
index = parent_index | |
end | |
@a[index] = value | |
self | |
end | |
def pop | |
return if @a.empty? | |
popped = @a.first | |
value = @a.pop | |
last_index = @a.size - 1 | |
return popped unless 0 <= last_index | |
last_parent = (last_index - 1) / 2 | |
index = 0 | |
child_index = 1 | |
while index <= last_parent | |
child_value = @a[child_index] | |
if child_index < last_index && @a[child_index + 1] < child_value | |
child_index += 1 | |
child_value = @a[child_index] | |
end | |
break if value <= child_value | |
@a[index] = child_value | |
index = child_index | |
child_index = index * 2 + 1 | |
end | |
@a[index] = value | |
popped | |
end | |
# rubocop:enable Metrics/MethodLength, Metrics/AbcSize | |
end | |
D = 2 | |
qklass = Rb2Heap | |
q = Rb2Heap.new | |
- name: Rb4Heap | |
prelude: | | |
# a pure ruby quaternary heap. | |
# | |
# Unlike 4heaps written in low-level languages like C or go, this is only | |
# a tiny improvement over Rb2Heap. It's necessary to manually unroll the | |
# min-child-index loop, otherwise it's slower than Rb2Heap. | |
class Rb4Heap | |
attr_reader :a | |
# quick initialization by simply sorting the array once. | |
def initialize(count = nil) | |
@a = [] | |
return unless count | |
count.times {|i| @a << yield(i) } | |
@a.sort! | |
end | |
def clear; @a.clear end | |
def empty?; @a.empty? end | |
def size; @a.size end | |
# rubocop:disable Metrics/CyclomaticComplexity, Metrics/BlockNesting, Metrics/PerceivedComplexity, Metrics/MethodLength, Metrics/AbcSize, Style/ParallelAssignment | |
def <<(value) | |
raise ArgumentError unless value | |
index = @a.size | |
while 0 < index # rubocop:disable Style/NumericPredicate | |
parent_index = (index - 1) / 4 | |
parent_value = @a[parent_index] | |
break if parent_value <= value | |
@a[index] = parent_value | |
index = parent_index | |
end | |
@a[index] = value | |
self | |
end | |
def pop | |
return if @a.empty? | |
popped = @a.first | |
swap_val = @a.pop | |
last_idx = @a.size - 1 | |
return popped unless 0 <= last_idx | |
last_parent = (last_idx - 1) / 4 | |
index = 0 | |
while index <= last_parent # first child | |
# find minimum child, idx and val | |
# | |
# n.b. in my benchmarks, With a while loop to find the min child, | |
# Rb2Heap is faster than Rb4Heap | |
# With the loop unrolled, Rb4Heap is faster. | |
min_idx = index * 4 + 1 | |
min_val = @a[min_idx] | |
if min_idx + 3 <= last_idx | |
cmp_idx = min_idx + 1 # second child | |
cmp_val = @a[cmp_idx] | |
min_idx, min_val = cmp_idx, cmp_val if cmp_val < min_val | |
cmp_idx += 1 # third child | |
cmp_val = @a[cmp_idx] | |
min_idx, min_val = cmp_idx, cmp_val if cmp_val < min_val | |
cmp_idx += 1 # fourth child | |
cmp_val = @a[cmp_idx] | |
min_idx, min_val = cmp_idx, cmp_val if cmp_val < min_val | |
elsif min_idx < last_idx # second child | |
cmp_idx = min_idx + 1 | |
cmp_val = @a[cmp_idx] | |
min_idx, min_val = cmp_idx, cmp_val if cmp_val < min_val | |
if cmp_idx < last_idx # third child | |
cmp_idx += 1 | |
cmp_val = @a[cmp_idx] | |
min_idx, min_val = cmp_idx, cmp_val if cmp_val < min_val | |
if cmp_idx < last_idx # fourth child | |
cmp_idx += 1 | |
cmp_val = @a[cmp_idx] | |
min_idx, min_val = cmp_idx, cmp_val if cmp_val < min_val | |
end | |
end | |
end | |
break if swap_val <= min_val | |
@a[index] = min_val | |
index = min_idx | |
min_idx = index * 4 + 1 | |
end | |
@a[index] = swap_val | |
popped | |
end | |
# rubocop:enable Metrics/CyclomaticComplexity, Metrics/BlockNesting, Metrics/PerceivedComplexity, Metrics/MethodLength, Metrics/AbcSize, Style/ParallelAssignment | |
end | |
D = 4 | |
qklass = Rb4Heap | |
q = Rb4Heap.new | |
prelude: | | |
############################################################################ | |
# Create an array of random values. | |
# | |
# This is more than 4x faster than calling Kernel#rand inside the benchmarks. | |
############################################################################ | |
random_idx = -1 | |
random_len = 3_162_278 | |
random_vals = Array.new(random_len) { rand(0..100_000_000) } | |
############################################################################ | |
# Benchmark parameters | |
############################################################################ | |
i = j = 0 | |
############################################################################ | |
# for pushpop | |
############################################################################ | |
pushpop_i = 1 | |
low_n = nil | |
high_n = nil | |
prefill = -> (qklass, n) { | |
raise "D is too big for that N" if n - D/2 < 0 | |
low_n = n - D/2 | |
high_n = n + (D-1)/2 | |
raise "bad N range: (#{low_n}...#{high_n})" unless high_n - low_n == D - 1 | |
if qklass.name == "BSearch" | |
qklass.new(n) { | |
random_idx += 1 | |
random_vals.fetch(random_idx % random_len) | |
} | |
else | |
q = qklass.new | |
n.times do | |
random_idx += 1 | |
q << random_vals.fetch(random_idx % random_len) | |
end | |
q | |
end | |
} | |
# in order to avoid any oddities that might come from pushing and popping to a | |
# heap that is always the exact same size, +/-1, we'll introduce some | |
# non-random "jitter", by growing or shrinking the heap every so often. | |
# how many reps between each "jitter" size adjustment | |
REPS_PER_SIZE_ADJUSTMENT = 100_000 | |
# Because sift-up is simple and very fast, and sift-down is where most of the | |
# performance gains are made, we'll sift-down (i.e. pop) one at a time, but | |
# sift-up (i.e. push) all at once. We'll go from (N+D/2) down to (N-D/2) | |
teardown: | | |
if defined?(N) && defined?(N2) # in the push_n_then_pop_n bms | |
puts "teardown: size: %p, N: %p, i: %p, j: %p, __bmdv_i: %p" % [ | |
q.size, N, i, j, __bmdv_i | |
] | |
unless q.empty? && N == i && N == j && __bmdv_i % N == 0 | |
raise "N x 2 (%d) must be a multiple of loop_count %d" % [N2, __bmdv_i] | |
end | |
end | |
benchmark: | |
# for "push N", loop_count must be a multiple of N | |
- name: push N (N = 3,162,278) | |
prelude: | | |
exit 1 if qklass.name == "BSearch" # bsearch will take forever... | |
N = 3_162_278 | |
loop_count: 6_324_556 # must be a multiple of N | |
script: &push_script | | |
q.clear if __bmdv_i % N == 0 | |
random_idx += 1 | |
q << random_vals.fetch(random_idx % random_len) | |
- name: push N (N = 1,000,000) | |
prelude: | | |
exit 1 if qklass.name == "BSearch" # bsearch will take forever... | |
N = 1_000_000 | |
loop_count: 6_000_000 # must be a multiple of N | |
script: *push_script | |
- name: push N (N = 316,228) | |
prelude: | | |
exit 1 if qklass.name == "BSearch" # toooo slooooow | |
N = 316_228 | |
loop_count: 6_324_560 # must be a multiple of N | |
script: *push_script | |
- name: push N (N = 100,000) | |
prelude: N = 100_000 | |
loop_count: 6_000_000 # must be a multiple of N | |
script: *push_script | |
- name: push N (N = 31,623) | |
prelude: N = 31_623 | |
loop_count: 6_324_600 # must be a multiple of N | |
script: *push_script | |
- name: push N (N = 10,000) | |
prelude: N = 10_000 | |
loop_count: 6_000_000 # must be a multiple of N | |
script: *push_script | |
- name: push N (N = 3,162) | |
prelude: N = 3162 | |
loop_count: 6_324_000 # must be a multiple of N | |
script: *push_script | |
- name: push N (N = 1,000) | |
prelude: N = 1_000 | |
loop_count: 6_000_000 # must be a multiple of N | |
script: *push_script | |
- name: push N (N = 316) | |
prelude: N = 316 | |
loop_count: 6_320_000 # must be a multiple of N | |
script: *push_script | |
- name: push N (N = 100) | |
prelude: N = 100 | |
loop_count: 6_000_000 # must be a multiple of N | |
script: *push_script | |
- name: push N (N = 32) | |
prelude: N = 32 | |
loop_count: 6_400_000 # must be a multiple of N | |
script: *push_script | |
- name: push N (N = 10) | |
prelude: N = 10 | |
loop_count: 6_000_000 # must be a multiple of N | |
script: *push_script | |
# no loop_count restrictions for pushpop. | |
- name: pushpop w/ size=3,162,278 | |
prelude: | | |
exit 1 if qklass.name == "BSearch" | |
q = prefill.call qklass, 3_162_278 | |
script: &pushpop_script | | |
random_idx += 1 | |
q << random_vals.fetch(random_idx % random_len) | |
q.pop | |
if pushpop_i % REPS_PER_SIZE_ADJUSTMENT == 0 | |
if q.size == low_n | |
D.times { q << random_vals.fetch(random_idx % random_len) } | |
else | |
q.pop | |
end | |
end | |
pushpop_i += 1 | |
- name: pushpop w/ size=1,000,000 | |
prelude: | | |
exit 1 if qklass.name == "BSearch" | |
q = prefill.call qklass, 1_000_000 | |
script: *pushpop_script | |
- name: pushpop w/ size= 316,228 | |
prelude: "q = prefill.call qklass, 316_228" | |
script: *pushpop_script | |
- name: pushpop w/ size= 100,000 | |
prelude: "q = prefill.call qklass, 100_000" | |
script: *pushpop_script | |
- name: pushpop w/ size= 31,623 | |
prelude: "q = prefill.call qklass, 31_623" | |
script: *pushpop_script | |
- name: pushpop w/ size= 10,000 | |
prelude: "q = prefill.call qklass, 10_000" | |
script: *pushpop_script | |
- name: pushpop w/ size= 3,162 | |
prelude: "q = prefill.call qklass, 3_162" | |
script: *pushpop_script | |
- name: pushpop w/ size= 1,000 | |
prelude: "q = prefill.call qklass, 1_000" | |
script: *pushpop_script | |
- name: pushpop w/ size= 316 | |
prelude: "q = prefill.call qklass, 316" | |
script: *pushpop_script | |
- name: pushpop w/ size= 100 | |
prelude: "q = prefill.call qklass, 100" | |
script: *pushpop_script | |
- name: pushpop w/ size= 32 | |
prelude: "q = prefill.call qklass, 32" | |
script: *pushpop_script | |
- name: pushpop w/ size= 10 | |
prelude: "q = prefill.call qklass, 10" | |
script: *pushpop_script | |
# N2, i, and j are used to control push vs pop. | |
# | |
# This specific benchmark is only accurate when the loop_count exactly match a | |
# multiple of N2. Doing it like this (instead of by pushing and popping all N | |
# items each time through the loop) allows a more intuitive apples-to-apples | |
# comparison with the other benchmarks. It's basically measuring the | |
# *average* push or pop time. | |
# | |
# N values were chosen to approximate sqrt(10), so every *two* benchmarks | |
# grows/shrinkgs by 10x. | |
- name: push N, pop N (N=3,162,278) | |
prelude: | | |
exit 1 if qklass.name == "BSearch" | |
N = 3_162_278; N2 = N * 2 | |
loop_count: 6_324_556 # must be a multiple of N * 2 | |
script: &push_n_then_pop_n | | |
if i < N | |
q.clear if __bmdv_i == 0 | |
random_idx += 1 | |
q << random_vals.fetch(random_idx % random_len) | |
i += 1 | |
elsif j < N | |
q.pop | |
j += 1 | |
elsif q.empty? | |
j = 0 | |
random_idx += 1 | |
q << random_vals.fetch(random_idx % random_len) | |
i = 1 | |
else | |
raise "q not empty! size: %p, N: %p, i: %p, j: %p, __bmdv_i: %p" % [ | |
q.size, N, i, j, __bmdv_i | |
] | |
end | |
- name: push N, pop N (N=1,000,000) | |
prelude: | | |
exit 1 if qklass.name == "BSearch" | |
N = 1_000_000; N2 = N * 2 | |
loop_count: 6_000_000 # must be a multiple of N * 2 | |
script: *push_n_then_pop_n | |
- name: push N, pop N (N= 316,228) | |
prelude: | | |
exit 1 if qklass.name == "BSearch" | |
N = 316_228; N2 = N * 2 | |
loop_count: 6_324_560 # must be a multiple of N * 2 | |
script: *push_n_then_pop_n | |
- name: push N, pop N (N= 100,000) | |
prelude: | | |
N = 100_000; N2 = N * 2 | |
loop_count: 6_000_000 # must be a multiple of N * 2 | |
script: *push_n_then_pop_n | |
- name: push N, pop N (N= 31,623) | |
prelude: | | |
N = 31_623; N2 = N * 2 | |
loop_count: 6_324_600 # must be a multiple of N * 2 | |
script: *push_n_then_pop_n | |
- name: push N, pop N (N= 10,000) | |
prelude: | | |
N = 10_000; N2 = N * 2 | |
loop_count: 6_000_000 # must be a multiple of N * 2 | |
script: *push_n_then_pop_n | |
- name: push N, pop N (N= 3,162) | |
prelude: | | |
N = 3162; N2 = N * 2 | |
loop_count: 6_324_000 # must be a multiple of N * 2 | |
script: *push_n_then_pop_n | |
- name: push N, pop N (N= 1,000) | |
prelude: | | |
N = 1_000; N2 = N * 2 | |
loop_count: 6_000_000 # must be a multiple of N * 2 | |
script: *push_n_then_pop_n | |
- name: push N, pop N (N= 316) | |
prelude: | | |
N = 316; N2 = N * 2 | |
loop_count: 6_320_000 # must be a multiple of N * 2 | |
script: *push_n_then_pop_n | |
- name: push N, pop N (N= 100) | |
prelude: | | |
N = 100; N2 = N * 2 | |
loop_count: 6_000_000 # must be a multiple of N * 2 | |
script: *push_n_then_pop_n | |
- name: push N, pop N (N= 32) | |
prelude: | | |
N = 32; N2 = N * 2 | |
loop_count: 6_400_000 # must be a multiple of N * 2 | |
script: *push_n_then_pop_n | |
- name: push N, pop N (N= 10) | |
prelude: | | |
N = 10; N2 = N * 2 | |
loop_count: 6_000_000 # must be a multiple of N * 2 | |
script: *push_n_then_pop_n | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I quickly extracted these from the timers gem and from my own benchmarks in my d_heap gem.
Rb2Heap
andRb4Heap
are my best attempts so far at optimizing a pure ruby heap (tested under ruby 2.7.2 w/o JIT). There are rspec tests for all implementations in their respective source gems.The
BSearch
algorithm is always at least a little bit slower forpush
(usingArray#bsearch_index
+Array#insert
), but it always much much faster forpop
(usingArray#pop
). So certain applications--e.g. graphing algorithms which push more items than they pop--will always benefit from a proper heap. But if you expect your heap size to be under 10k items and will eventually pop every item you push, then the pure ruby heap will probably be slower thanbsearch_index
+index
. Of course, if you are willing to use C (or Java) extensions then you can use d_heap, which is much faster than any of these. :)