-
-
Save davidrunger/2e38719d074d84d53519 to your computer and use it in GitHub Desktop.
require 'benchmark' | |
# generate some random strings | |
n = 5_000_000 | |
string_n = '' | |
string_2n = '' | |
letters = ('a'..'z').to_a | |
n.times { string_n << letters.sample } | |
(n * 2).times { string_2n << letters.sample } | |
n = 1_000 | |
puts 'Slice half of string:' | |
Benchmark.bm do |b| | |
b.report('includes last character; n') do | |
n.times do | |
string_n[(string_n.length / 2)..-1] | |
end | |
end | |
b.report('includes last character; 2n') do | |
n.times do | |
string_2n[(string_2n.length / 2)..-1] | |
end | |
end | |
b.report("doesn't include last character; n") do | |
n.times do | |
string_n[((string_n.length / 2) - 1)..-2] | |
end | |
end | |
b.report("doesn't include last character; 2n") do | |
n.times do | |
string_2n[((string_2n.length / 2) - 1)..-2] | |
end | |
end | |
end |
Slice half of string: | |
user system total real | |
includes last character; n 0.000000 0.000000 0.000000 ( 0.000378) | |
includes last character; 2n 0.000000 0.000000 0.000000 ( 0.000343) | |
doesn't include last character; n 0.540000 0.430000 0.970000 ( 1.028334) | |
doesn't include last character; 2n 1.270000 0.760000 2.030000 ( 2.069830) |
@ruggeri Oh, wow, so awesome! I only just now saw this comment. Super cool. Thanks for sharing those results. I wonder how much of the current Ruby implementation of String
is incompatible with SHARABLE_MIDDLE_SUBSTRING=1
. In other words, I wonder what sort of bugs would have cropped up if you had continued playing around with your custom build of the Ruby interpreter. Presumably at least some of the current Ruby String
implementation is incompatible with sharable middle substrings; otherwise the Ruby gods would have already given us this fast, new behavior as the default.
It's really cool to see a concrete example of one way that an enormous performance gain could/can be squeezed out of Ruby (at least for programs that are doing a lot of string slicing). Thanks again for sharing.
We experimented a little bit, but I suspect that all of the Ruby methods work fine even with this macro switched on. I think the reason why this is switched off by default is not the Ruby standard library, but because of the fact that many Ruby applications are using other C modules or libraries which expect to manipulate null-terminated strings. When they're passed Ruby in-memory Ruby strings that are not properly null-terminated, I'm guessing that there will be some unpredictable errors (especially if those errors only show up for sliced strings that do not include the final character, which might be a nightmare to debug).
We posited that's why it's switched off by default.
Interesting idea, and very plausible. I suspect you might be right. That would indeed be a bear to debug. If correct, it's tantalizing to think about the performance gain that would be possible for Ruby, but for the (relatively small, I would think) set of Ruby applications that need to use C libraries to work with Ruby-sliced strings. On the other hand, I guess if one has to slice a 2.5 megabyte substring 1,000 times just for it to take 1 second on a laptop computer, then maybe there aren't really too many noteworthy, real-world performance gains being left on the table, except for a very small set of apps that are doings a lot of slicing of large strings. Back on the other hand, every little bit helps. And transforming an operation from O(n)
to O(1)
is just so satisfying. :)
Cool find!
I compiled Ruby with SHARABLE_MIDDLE_SUBSTRING=1 and doing so suggests that indeed the string is shared (no copy occurs) even when excluding the last character. The first run is with the option on, and the second is my default ruby, where presumably middle sharing is disabled.