Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save karthiks/1574696 to your computer and use it in GitHub Desktop.
Save karthiks/1574696 to your computer and use it in GitHub Desktop.
Not all strings are created equal in Ruby 1.9.3
# source: http://patshaughnessy.net/2012/1/4/never-create-ruby-strings-longer-than-23-characters
# Observation:Not all strings are created equal in Ruby 1.9.3.
# Ruby actually uses three different types of string values:
# - Heap Strings,
# - Shared Strings, and
# - Embedded Strings
# How Ruby creates new string values?
# Whenever you create a string value in your Ruby 1.9 code, the interpreter goes through an algorithm similar to this:
# 1.> Is this a new string value? Or a copy of an existing string? If it’s a copy, Ruby creates a Shared String. This is the fastest option, since Ruby only needs a new RString structure, and not another copy of the existing string data.
# 2.> Is this a long string? Or a short string? If the new string value is 23 characters or less, Ruby creates an Embedded String. While not as fast as a Shared String, it’s still fast because the 23 characters are simply copied right into the RString structure and there’s no need to call malloc.
# 3.> Finally, for long string values, 24 characters or more, Ruby creates a Heap String - meaning it calls malloc and gets some new memory from the heap, and then copies the string value there. This is the slowest option.
#
# The value of RSTRING_EMBED_LEN_MAX was chosen to match the size of the len/ptr/capa values. That’s where the 23 limit comes from.
# Even worse, the value of RSTRING_EMBED_LEN_MAX for a 32-bit machine is less, in fact it's only 11.
# IMPORTANT:
# Don't refactor your code to use 11 chars! Despite the poorly chosen title,
# the author's goal of writing this was to explain an interesting MRI optimization and
# to encourage people to take a look at the Ruby C source code
# CODE: Benchmarking Ruby string allocation
require "benchmark"
ITERATIONS = 100000000
def run(str, bench)
bench.report("#{str.length+1} chars") do
ITERATIONS.times { new_string = str + 'x'}
end
end
Benchmark.bm do |bench|
run("123",bench)
run("12345",bench)
run("1234567",bench)
run("123456789",bench)
run("1234567890",bench)
run("12345678901",bench)
run("1234567890123",bench)
run("123456789012345",bench)
run("12345678901234567",bench)
run("123456789012345678901",bench)
run("12345678901234567890123",bench)
run("1234567890123456789012345",bench)
run("1234567890123456789012346789",bench)
run("1234567890123456789012345678901",bench)
run("123456789012345678901234567890123456789",bench)
end
# Output:
# $ ruby benchmarking_string_allocations_by_length.rb
# user system total real
# 4 chars 38.700000 0.040000 38.740000 ( 39.398413)
# 6 chars 44.190000 0.130000 44.320000 ( 45.957612)
# 8 chars 40.510000 0.080000 40.590000 ( 41.725284)
# 10 chars 46.690000 0.200000 46.890000 ( 49.430882)
# 11 chars 43.870000 0.090000 43.960000 ( 45.726567)
# 12 chars 57.280000 0.120000 57.400000 ( 59.581982)
# 14 chars 54.010000 0.100000 54.110000 ( 55.886625)
# 16 chars 54.860000 0.160000 55.020000 ( 56.859089)
# 18 chars 54.650000 0.140000 54.790000 ( 55.648667)
# 22 chars 56.190000 0.120000 56.310000 ( 57.547933)
# 24 chars 57.120000 0.120000 57.240000 ( 59.552048)
# 26 chars 58.040000 0.120000 58.160000 ( 60.899569)
# 29 chars 55.740000 0.160000 55.900000 ( 58.227123)
# 32 chars 56.870000 0.150000 57.020000 ( 59.611066)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment