Created
October 18, 2010 15:44
-
-
Save brixen/632443 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Original blog post about Rubinius performance on the alioth mandelbrot benchmark: | |
# http://rfc2616.wordpress.com/2010/10/16/rubinius-vs-the-benchmark-from-hell/ | |
# | |
# Problems with assumptions in the blog post: | |
# * The C <-> Ruby comparison is apples to oranges because the Ruby code | |
# is written to use blocks rather than loops. That imposes the overhead | |
# of additional execution contexts per pixel. | |
# * The output is written a byte at a time, which requires a fairly deep | |
# chain of methods before the byte is handed off to the OS. | |
# * The work is done in the script body. Unless the implementation has | |
# on-stack replacement, a JIT can do no effective work. | |
# * There is no warmup performed to give the JIT an opportunity to work. | |
# | |
# See the different versions of mandelbrot.rb tested in the results below. | |
# Each file has a comment with the changes made. When the blocks are replaced | |
# with loops, the Rubinius JIT is able to handle the code well and performs | |
# nearly 2x better than 1.9.2p0. However, the '50.times' block uses a non-local | |
# exit in the 'break' statement. That form is too complex for the Rubinius | |
# JIT to handle at this time, so even giving the JIT time to work, the result | |
# is slightly slower than 1.9.2p0. | |
# | |
# To investigate how the Rubinius JIT is working, try out these command options: | |
# -Xjit.log=mandelbrot.log -Xjit.debug -Xjit.inline.debug | |
# | |
# For example, run this command | |
# rbx -Xjit.log=mandelbrot.log -Xjit.debug -Xjit.inline.debug \ | |
# mandelbrot.method.rb 200 > mandelbrot.output.txt | |
# | |
# Viewing mandelbrot.log, you will see a lot of information on exactly what the | |
# JIT is doing. | |
# | |
# Extrapolating from the results below, using the following formula: | |
# | |
# let N be the image size (ie, the value passed to the script) | |
# let S be the seconds required to compute mandelbrot at N | |
# then | |
# S_per_N = sqrt(S) / N | |
# or | |
# S = (S_per_N * N) ^ 2 | |
# | |
# Adjusting seconds to minutes, Rubinius should run the benchmark | |
# in 1. below in ~48 minutes, while 1.9.2p0 should run it in ~82 minutes. | |
# For the benchmark in 2. below, Rubinius should run in ~108 minutes, | |
# while 1.9.2p0 should run in ~107 minutes | |
# Running the modified mandelbrot.rb in 1. below on rbx master | |
# | |
$ rbx mandelbrot.rb 200 > mandelbrot.output.txt | |
0.443856 0.001552 0.445408 ( 0.450146) | |
$ diff mandelbrot.output.txt ~/Downloads/mandelbrot-output.txt | |
$ rbx mandelbrot.rb 600 > mandelbrot.output.txt | |
3.977509 0.010174 3.987683 ( 4.040616) | |
$ rbx mandelbrot.rb 1200 > mandelbrot.output.txt | |
15.883694 0.037721 15.921415 ( 16.041488) | |
$ rbx mandelbrot.rb 2400 > mandelbrot.output.txt | |
63.707972 0.154519 63.862491 ( 64.337398) | |
# Running the modified mandelbrot.rb in 1. below on 1.9.2p0 | |
# | |
$ ruby1.9.2 mandelbrot.rb 200 > mandelbrot.output.txt | |
0.760000 0.000000 0.760000 ( 0.763492) | |
$ diff mandelbrot.output.txt ~/Downloads/mandelbrot-output.txt | |
$ ruby1.9.2 mandelbrot.rb 600 > mandelbrot.output.txt | |
6.800000 0.020000 6.820000 ( 6.851136) | |
$ ruby1.9.2 mandelbrot.rb 1200 > mandelbrot.output.txt | |
27.180000 0.060000 27.240000 ( 27.356711) | |
$ ruby1.9.2 mandelbrot.rb 2400 > mandelbrot.output.txt | |
108.870000 0.320000 109.190000 (109.921816) | |
# Running the modified mandelbrot.rb in 2. below on rbx master | |
# | |
$ rbx mandelbrot.method.rb 200 > mandelbrot.output.txt | |
1.233717 0.039253 1.272970 ( 1.204125) | |
$ diff mandelbrot.output.txt ~/Downloads/mandelbrot-output.txt | |
$ rbx mandelbrot.method.rb 600 > mandelbrot.output.txt | |
9.273957 0.306050 9.580007 ( 9.193582) | |
$ rbx mandelbrot.method.rb 1200 > mandelbrot.output.txt | |
36.796441 1.228065 38.024506 ( 36.703456) | |
$ rbx mandelbrot.method.rb 2400 > mandelbrot.output.txt | |
147.160603 4.976744 152.137347 (147.145323) | |
# Running the modified mandelbrot.rb in 2. below on 1.9.2p0 | |
# | |
$ ruby1.9.2 mandelbrot.method.rb 200 > mandelbrot.output.txt | |
0.990000 0.010000 1.000000 ( 1.014719) | |
$ diff mandelbrot.output.txt ~/Downloads/mandelbrot-output.txt | |
$ ruby1.9.2 mandelbrot.method.rb 600 > mandelbrot.output.txt | |
8.910000 0.030000 8.940000 ( 9.012608) | |
$ ruby1.9.2 mandelbrot.method.rb 1200 > mandelbrot.output.txt | |
35.680000 0.120000 35.800000 ( 36.125448) | |
$ ruby1.9.2 mandelbrot.method.rb 2400 > mandelbrot.output.txt | |
142.430000 0.400000 142.830000 (144.191197) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# The Computer Language Benchmarks Game | |
# http://shootout.alioth.debian.org/ | |
# | |
# contributed by Karl von Laudermann | |
# modified by Jeremy Echols | |
# modified by Detlef Reichl | |
# modified by Joseph LaFata | |
# -------- | |
# Modifications from original: | |
# | |
# * Put work into a method. | |
# * Write to a specified IO. | |
# * Replaced 'for ... in ...' with while loops. | |
# * Replaced writing each byte with setting bytes in a buffer | |
# and writing the buffer out at the end. | |
# * Change unnecessary Float multiplication with Integer multiplication. | |
# * Run a warmup process for the JIT. | |
# * Use Benchmark.measure to calculate the duration. | |
# -------- | |
require 'benchmark' | |
size = ARGV.shift.to_i | |
unless defined?(RUBY_ENGINE) && RUBY_ENGINE == 'rbx' | |
class String | |
def self.pattern(n, c) | |
("" << c) * n | |
end | |
end | |
else | |
class String | |
def setbyte(i, b) | |
self[i] = b | |
end | |
end | |
end | |
def mandelbrot(size, io) | |
byte_acc = 0 | |
bit_num = 0 | |
byte = 0 | |
image_size = ((size + 7) / 8) * size | |
buf = String.pattern image_size, 0 | |
count_size = size - 1 # Precomputed size for easy for..in looping | |
size_f = size.to_f | |
y = 0 | |
while y < size | |
ci = (2*y/size_f)-1.0 | |
x = 0 | |
while x < size | |
zrzr = zr = 0.0 | |
zizi = zi = 0.0 | |
cr = (2*x/size_f)-1.5 | |
escape = 0b1 | |
k = 0 | |
while k < 50 | |
tr = zrzr - zizi + cr | |
ti = 2.0*zr*zi + ci | |
zr = tr | |
zi = ti | |
# preserve recalculation | |
zrzr = zr*zr | |
zizi = zi*zi | |
if zrzr+zizi > 4.0 | |
escape = 0b0 | |
break | |
end | |
k += 1 | |
end | |
byte_acc = (byte_acc << 1) | escape | |
bit_num += 1 | |
# Code is very similar for these cases, but using separate blocks | |
# ensures we skip the shifting when it's unnecessary, which is most cases. | |
if (bit_num == 8) | |
buf.setbyte byte, byte_acc | |
byte += 1 | |
byte_acc = 0 | |
bit_num = 0 | |
elsif (x == count_size) | |
buf.setbyte byte, byte_acc << (8 - bit_num) | |
byte += 1 | |
byte_acc = 0 | |
bit_num = 0 | |
end | |
x += 1 | |
end | |
y += 1 | |
end | |
io.puts "P4\n#{size} #{size}" | |
io.write buf | |
end | |
File.open "/dev/null", "w" do |f| | |
200.times { mandelbrot 24, f } | |
end | |
duration = Benchmark.measure do | |
mandelbrot size, STDOUT | |
end | |
STDERR.puts duration |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# The Computer Language Benchmarks Game | |
# http://shootout.alioth.debian.org/ | |
# | |
# contributed by Karl von Laudermann | |
# modified by Jeremy Echols | |
# modified by Detlef Reichl | |
# modified by Joseph LaFata | |
# -------- | |
# Modifications from original: | |
# | |
# * Put work into a method. | |
# * Write to a specified IO. | |
# * Run a warmup process for the JIT. | |
# * Use Benchmark.measure to calculate the duration. | |
# -------- | |
require 'benchmark' | |
size = ARGV.shift.to_i | |
def mandelbrot(size, io) | |
io.puts "P4\n#{size} #{size}" | |
byte_acc = 0 | |
bit_num = 0 | |
count_size = size - 1 # Precomputed size for easy for..in looping | |
range = 0..count_size | |
# For..in loops are faster than .upto, .downto, .times, etc. | |
# That's not true, but left it here | |
for y in range | |
ci = (2.0*y/size)-1.0 | |
for x in range | |
zrzr = zr = 0.0 | |
zizi = zi = 0.0 | |
cr = (2.0*x/size)-1.5 | |
escape = 0b1 | |
50.times do | |
tr = zrzr - zizi + cr | |
ti = 2.0*zr*zi + ci | |
zr = tr | |
zi = ti | |
# preserve recalculation | |
zrzr = zr*zr | |
zizi = zi*zi | |
if zrzr+zizi > 4.0 | |
escape = 0b0 | |
break | |
end | |
end | |
byte_acc = (byte_acc << 1) | escape | |
bit_num += 1 | |
# Code is very similar for these cases, but using separate blocks | |
# ensures we skip the shifting when it's unnecessary, which is most cases. | |
if (bit_num == 8) | |
io.print byte_acc.chr | |
byte_acc = 0 | |
bit_num = 0 | |
elsif (x == count_size) | |
byte_acc <<= (8 - bit_num) | |
io.print byte_acc.chr | |
byte_acc = 0 | |
bit_num = 0 | |
end | |
end | |
end | |
end | |
File.open "/dev/null", "w" do |f| | |
200.times { mandelbrot 24, f } | |
end | |
duration = Benchmark.measure do | |
mandelbrot size, STDOUT | |
end | |
STDERR.puts duration |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# The Computer Language Benchmarks Game | |
# http://shootout.alioth.debian.org/ | |
# | |
# contributed by Karl von Laudermann | |
# modified by Jeremy Echols | |
# modified by Detlef Reichl | |
# modified by Joseph LaFata | |
size = ARGV.shift.to_i | |
puts "P4\n#{size} #{size}" | |
byte_acc = 0 | |
bit_num = 0 | |
count_size = size - 1 # Precomputed size for easy for..in looping | |
range = 0..count_size | |
# For..in loops are faster than .upto, .downto, .times, etc. | |
# That's not true, but left it here | |
for y in range | |
ci = (2.0*y/size)-1.0 | |
for x in range | |
zrzr = zr = 0.0 | |
zizi = zi = 0.0 | |
cr = (2.0*x/size)-1.5 | |
escape = 0b1 | |
50.times do | |
tr = zrzr - zizi + cr | |
ti = 2.0*zr*zi + ci | |
zr = tr | |
zi = ti | |
# preserve recalculation | |
zrzr = zr*zr | |
zizi = zi*zi | |
if zrzr+zizi > 4.0 | |
escape = 0b0 | |
break | |
end | |
end | |
byte_acc = (byte_acc << 1) | escape | |
bit_num += 1 | |
# Code is very similar for these cases, but using separate blocks | |
# ensures we skip the shifting when it's unnecessary, which is most cases. | |
if (bit_num == 8) | |
print byte_acc.chr | |
byte_acc = 0 | |
bit_num = 0 | |
elsif (x == count_size) | |
byte_acc <<= (8 - bit_num) | |
print byte_acc.chr | |
byte_acc = 0 | |
bit_num = 0 | |
end | |
end | |
end |
7 months later, I just ran the tests again (I have RVM set up with rather recent yet common versions of both Rubinius and YARV Ruby as shown below).
The results:
Ruby 1.9.3p194
gist-632443/ $ ruby -v
ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-darwin11.3.0]
gist-632443/ $ ruby mandelbrot.rb 2400 > mandelbrot_mri.txt
93.180000 0.170000 93.350000 ( 93.471355)
gist-632443/ $ ruby mandelbrot.method.rb 2400 > mandelbrot_method_mri.txt
101.640000 0.170000 101.810000 (101.862339)
Rubinius 2.0.0dev
gist-632443/ $ ruby -v
rubinius 2.0.0dev (1.9.3 3986181c yyyy-mm-dd JI) [x86_64-apple-darwin11.3.0]
gist-632443/ $ rbx mandelbrot.rb 2400 > mandelbrot_rbx.txt
47.577254 0.080042 47.657296 ( 47.543491)
gist-632443/ $ rbx mandelbrot.method.rb 2400 > mandelbrot_method_rbx.txt
65.071445 2.535044 67.606489 ( 67.640278)
So, yeah, whatever you guys are doing behind the scenes, keep on doing it :)
And in all fairness, since JRuby 1.7.0.preview1 was just released:
JRuby 1.7.0.preview1
gist-632443/ $ ruby -v
jruby 1.7.0.preview1 (ruby-1.9.3-p203) (2012-05-21 a8a5aa2) (Java HotSpot(TM) 64-Bit Server VM 1.7.0_04) [darwin-x86_64-java]
gist-632443/ $ java -version
java version "1.7.0_04"
Java(TM) SE Runtime Environment (build 1.7.0_04-b21)
Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode)
gist-632443/ $ jruby mandelbrot.rb 2400 > mandelbrot_jruby.txt
14.270000 0.080000 14.350000 ( 14.137000)
gist-632443/ $ jruby mandelbrot.method.rb 2400 > mandelbrot_method_jruby.txt
23.890000 1.840000 25.730000 ( 25.296000)
No diffs between the resulting files, of course.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Moving calculation to object method speedups thing a bit https://gist.github.com/1295228