GC benchmarks for trunk vs gc-compact seem to be about the same:
$ make benchmark ITEM=gc
./revision.h unchanged
/Users/aaron/.rbenv/shims/ruby --disable=gems -rrubygems -I./benchmark/lib ./benchmark/benchmark-driver/exe/benchmark-driver \
--executables="compare-ruby::/Users/aaron/.rbenv/shims/ruby --disable=gems -I.ext/common --disable-gem" \
--executables="built-ruby::./miniruby -I./lib -I. -I.ext/common -r./prelude --disable-gem" \
$(ls ./benchmark/*gc*.{yml,rb} 2>/dev/null)
Calculating -------------------------------------
compare-ruby built-ruby
vm1_gc_short_lived 6.654M 6.863M i/s - 30.000M times in 4.508284s 4.371307s
vm1_gc_short_with_complex_long 7.717M 7.952M i/s - 30.000M times in 3.887685s 3.772617s
vm1_gc_short_with_long 5.953M 6.075M i/s - 30.000M times in 5.039375s 4.937911s
vm1_gc_short_with_symbol 7.593M 7.298M i/s - 30.000M times in 3.950842s 4.110539s
vm1_gc_wb_ary 63.941M 74.356M i/s - 30.000M times in 0.469183s 0.403465s
vm1_gc_wb_ary_promoted 64.257M 62.325M i/s - 30.000M times in 0.466874s 0.481350s
vm1_gc_wb_obj 72.445M 92.515M i/s - 30.000M times in 0.414109s 0.324271s
vm1_gc_wb_obj_promoted 77.314M 74.770M i/s - 30.000M times in 0.388029s 0.401229s
vm3_gc 0.937 0.949 i/s - 1.000 times in 1.067345s 1.053194s
vm3_gc_old_full 0.377 0.373 i/s - 1.000 times in 2.650514s 2.677493s
vm3_gc_old_immediate 0.599 0.543 i/s - 1.000 times in 1.669253s 1.840736s
vm3_gc_old_lazy 0.451 0.416 i/s - 1.000 times in 2.218040s 2.403648s
Comparison:
vm1_gc_short_lived
built-ruby: 6862936.0 i/s
compare-ruby: 6654416.6 i/s - 1.03x slower
vm1_gc_short_with_complex_long
built-ruby: 7952039.7 i/s
compare-ruby: 7716674.6 i/s - 1.03x slower
vm1_gc_short_with_long
built-ruby: 6075443.6 i/s
compare-ruby: 5953119.2 i/s - 1.02x slower
vm1_gc_short_with_symbol
compare-ruby: 7593318.1 i/s
built-ruby: 7298312.9 i/s - 1.04x slower
vm1_gc_wb_ary
built-ruby: 74355892.1 i/s
compare-ruby: 63940935.6 i/s - 1.16x slower
vm1_gc_wb_ary_promoted
compare-ruby: 64257165.7 i/s
built-ruby: 62324711.7 i/s - 1.03x slower
vm1_gc_wb_obj
built-ruby: 92515211.0 i/s
compare-ruby: 72444694.5 i/s - 1.28x slower
vm1_gc_wb_obj_promoted
compare-ruby: 77313809.0 i/s
built-ruby: 74770268.3 i/s - 1.03x slower
vm3_gc
built-ruby: 0.9 i/s
compare-ruby: 0.9 i/s - 1.01x slower
vm3_gc_old_full
compare-ruby: 0.4 i/s
built-ruby: 0.4 i/s - 1.01x slower
vm3_gc_old_immediate
compare-ruby: 0.6 i/s
built-ruby: 0.5 i/s - 1.10x slower
vm3_gc_old_lazy
compare-ruby: 0.5 i/s
built-ruby: 0.4 i/s - 1.08x slower
[aaron@TC-275 ~/g/ruby (gc-compact)]$ /Users/aaron/.rbenv/shims/ruby -v
ruby 2.7.0dev (2019-04-08 trunk 67472) [x86_64-darwin18]
[aaron@TC-275 ~/g/ruby (gc-compact)]$ ./ruby -v
ruby 2.7.0dev (2019-04-08 gc-compact 67472) [x86_64-darwin18]
last_commit=fix compiler warning
To test compaction impact, I recorded the heap just before processing the first request, compacted the heap after the first request finished, then recorded the heap after compaction.
This is the graph of the heap before compaction:
This is a graph of the heap after compaction:
Each column is a page, each square is a slot. Red slots are pinned (cannot move), black slots are filled but can move, white are empty.
Before compaction there were 595 pages. 99 / 595 were full (had no empty slots). 495 / 595 were fragmented (contained objects and free slots).
After compaction there were 519 pages. 451 / 595 were full (had no empty slots). 68 / 595 were fragmented (contained objects and free slots).
Here is a graph of the pinned objects vs unpinned objects:
Most objects are unpinned so they can move around.
For actual compaction performance I used a benchmark like this:
require 'benchmark/ips'
GC.start
puts "Baseline #{GC.stat(:heap_live_slots)}"
Benchmark.ips do |x|
x.report("compact") { GC.compact }
end
garbage = []
5.times do
100000.times { garbage << Object.new }
puts "Larger #{GC.stat(:heap_live_slots)}"
Benchmark.ips do |x|
x.report("compact") { GC.compact }
end
end
Results of the benchmark are below:
Live Objects, Iterations per Second
21260, 274.572
121267, 98.238
221271, 59.184
321306, 43.179
421306, 33.998
521306, 27.901
As expected, the more live objects, the longer compaction takes. I think we can improve this.