Created
January 7, 2012 13:01
-
-
Save karthiks/1574696 to your computer and use it in GitHub Desktop.
Not all strings are created equal in Ruby 1.9.3
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# source: http://patshaughnessy.net/2012/1/4/never-create-ruby-strings-longer-than-23-characters | |
# Observation:Not all strings are created equal in Ruby 1.9.3. | |
# Ruby actually uses three different types of string values: | |
# - Heap Strings, | |
# - Shared Strings, and | |
# - Embedded Strings | |
# How Ruby creates new string values? | |
# Whenever you create a string value in your Ruby 1.9 code, the interpreter goes through an algorithm similar to this: | |
# 1.> Is this a new string value? Or a copy of an existing string? If it’s a copy, Ruby creates a Shared String. This is the fastest option, since Ruby only needs a new RString structure, and not another copy of the existing string data. | |
# 2.> Is this a long string? Or a short string? If the new string value is 23 characters or less, Ruby creates an Embedded String. While not as fast as a Shared String, it’s still fast because the 23 characters are simply copied right into the RString structure and there’s no need to call malloc. | |
# 3.> Finally, for long string values, 24 characters or more, Ruby creates a Heap String - meaning it calls malloc and gets some new memory from the heap, and then copies the string value there. This is the slowest option. | |
# | |
# The value of RSTRING_EMBED_LEN_MAX was chosen to match the size of the len/ptr/capa values. That’s where the 23 limit comes from. | |
# Even worse, the value of RSTRING_EMBED_LEN_MAX for a 32-bit machine is less, in fact it's only 11. | |
# IMPORTANT: | |
# Don't refactor your code to use 11 chars! Despite the poorly chosen title, | |
# the author's goal of writing this was to explain an interesting MRI optimization and | |
# to encourage people to take a look at the Ruby C source code | |
# CODE: Benchmarking Ruby string allocation | |
require "benchmark" | |
ITERATIONS = 100000000 | |
def run(str, bench) | |
bench.report("#{str.length+1} chars") do | |
ITERATIONS.times { new_string = str + 'x'} | |
end | |
end | |
Benchmark.bm do |bench| | |
run("123",bench) | |
run("12345",bench) | |
run("1234567",bench) | |
run("123456789",bench) | |
run("1234567890",bench) | |
run("12345678901",bench) | |
run("1234567890123",bench) | |
run("123456789012345",bench) | |
run("12345678901234567",bench) | |
run("123456789012345678901",bench) | |
run("12345678901234567890123",bench) | |
run("1234567890123456789012345",bench) | |
run("1234567890123456789012346789",bench) | |
run("1234567890123456789012345678901",bench) | |
run("123456789012345678901234567890123456789",bench) | |
end | |
# Output: | |
# $ ruby benchmarking_string_allocations_by_length.rb | |
# user system total real | |
# 4 chars 38.700000 0.040000 38.740000 ( 39.398413) | |
# 6 chars 44.190000 0.130000 44.320000 ( 45.957612) | |
# 8 chars 40.510000 0.080000 40.590000 ( 41.725284) | |
# 10 chars 46.690000 0.200000 46.890000 ( 49.430882) | |
# 11 chars 43.870000 0.090000 43.960000 ( 45.726567) | |
# 12 chars 57.280000 0.120000 57.400000 ( 59.581982) | |
# 14 chars 54.010000 0.100000 54.110000 ( 55.886625) | |
# 16 chars 54.860000 0.160000 55.020000 ( 56.859089) | |
# 18 chars 54.650000 0.140000 54.790000 ( 55.648667) | |
# 22 chars 56.190000 0.120000 56.310000 ( 57.547933) | |
# 24 chars 57.120000 0.120000 57.240000 ( 59.552048) | |
# 26 chars 58.040000 0.120000 58.160000 ( 60.899569) | |
# 29 chars 55.740000 0.160000 55.900000 ( 58.227123) | |
# 32 chars 56.870000 0.150000 57.020000 ( 59.611066) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment