This is Japanese-English translation of my diary articles
These articles demonstration how to find out memory issues and tuning.
- How to estimate memory consumption
- How to use allocation_tracer gem to know object creation details
- How to optimize this typical issue in VM optimization
- How to use gc_tracer gem to know GC behaviour
- How to use valgrind/massif to know memory consumption details in memory
I hope this article help your Ruby life.
Ruby's performance tuning way.
Matz pointed out [this article] (http://kokizzu.blogspot.jp/2015/02/c-java-hhvm-ruby-nodejsrhinojscspidermo.html) for us and said "I want to reduce memory consumption of Ruby interpreter". It is an order to [our team] (https://blog.heroku.com/archives/2013/3/6/matz_highlights_ruby_2_0_at_waza).
Let's quote target program.
##############################
# scomb.rb
def newGap gap
gap /= 1.3;
gap = gap.to_i;
return 11 if gap == 9 or gap == 10
return 1 if gap < 1
gap
end
def combSort a
len = a.length
gap = len
swapped = false
begin
swapped = false
gap = newGap gap
(0...(len-gap)).each do |i|
if a[i] > a[i+gap]
swapped = true
a[i], a[i+gap] = a[i+gap], a[i]
end
end
end while gap > 1 or swapped
end
N = 10000000;
arr = N.times.map{|x| (N-x).to_s }
combSort arr
(1...N).each{ |z| print '!' if arr[z]<arr[z-1] }
This program is simple. Let's quote an evaluation result, too.
$ ruby --version
ruby 2.2.0p0 (2014-12-25 revision 49005) [x86_64-linux]
$ time ruby scomb.rb
CPU: 114.67s Real: 115.21s RAM: 870612KB
I evaluated this program on my own environment (ruby trunk on Ubuntu on VirtualBox on Windows 7).
CPU: 108.87s Real: 109.11s RAM: 860996KB
Almost similar.
At first, understand what happens.
At last lines, this program makes an array containing 10000000 = 10_000_000 = 10M entries. On 64bit CPU, we can estimate required memory comsmption:
- An array having 10M items -> 1 entry needs 8B (pointer size) * 10M items = 80MB
- 10M short strings = 40 byte (size of RVALUE) * 10M = 400MB
About 500MB. When 10M slots (slot is an basic space for each objects) are required, then Ruby's GC allocates x1.8 slots, so 10M * 1.8 = 18M RVALUE will be allocated. So total slots consumes 18M * 40B = 720MB. 720MB + 80MB = 800MB. It is similar to the evaluation result: RAM: 860,996KB = 860MB.
Let's consider using GC tuning (Ruby's GC can tune by using several environment variables). Allocatiing 1.8 times slots seems too huge, doesn't it? Let's try this parameter to use 1.2. RUBY_GC_HEAP_GROWTH_FACTOR environ variable specify this factor. Let's try.
$ RUBY_GC_HEAP_GROWTH_FACTOR=1.2 /usr/bin/time -f "\nCPU: %Us\tReal: %es\tRAM: %MKB" ./miniruby -I../../trunk/lib -I. -I.ext/common ../../trunk/test.rb
CPU: 108.32s Real: 108.49s RAM: 745388KB
Ok. Memory consumption is reduced. But not so big reduction.
Originally, which part of program consumes memory? A method combSort()
seems no memory consuming parts. Let's try run prgoram without calling combSort(arr)
.
CPU: 3.09s Real: 3.16s RAM: 527452KB
This is 500MB which we estimated for this program consuming memory. Therefore, some parts of calling combSort(arr)
seems to be consuming memory. Which part of this program make object?, and which type of objects?
In this case, we can use [allocation tracer] (https://github.com/ko1/allocation_tracer) gem.
Like that:
##############################
# scomb.rb
def newGap gap
gap /= 1.3;
gap = gap.to_i;
return 11 if gap == 9 or gap == 10
return 1 if gap < 1
gap
end
def combSort a
len = a.length
gap = len
swapped = false
begin
swapped = false
gap = newGap gap
(0...(len-gap)).each do |i|
if a[i] > a[i+gap]
swapped = true
a[i], a[i+gap] = a[i+gap], a[i]
end
end
end while gap > 1 or swapped
end
require 'allocation_tracer'
require 'pp'
pp ObjectSpace::AllocationTracer.trace{
N = 1000#0000;
arr = N.times.map{|x| (N-x).to_s }
combSort arr
(1...N).each{ |z| print '!' if arr[z]<arr[z-1] }
}
Only surrounding ObjectSpace::AllocationTracer.trace
block. Btw, we only know a trend, making 1K entries instead of 10M entries.
Results are here:
{["../../trunk/test.rb", 31]=>[1004, 0, 0, 0, 0, 0],
["../../trunk/test.rb", 17]=>[24, 0, 0, 0, 0, 0],
["../../trunk/test.rb", 20]=>[2546, 0, 0, 0, 0, 0],
["../../trunk/test.rb", 33]=>[1, 0, 0, 0, 0, 0]}
This means, for example, the result is [1004, 0, 0, 0, 0, 0] at the location ["../../trunk/test.rb", 31] (of course, it means ../../trunk/test.rb:31). It is difficult without any information.
- 1st column means 1004 objects are created at this line.
- 2nd column means 0 objects (created at this line) are promoted to old objects.
- 3rd column means total age of objects created at this line is 0.
- 4th column means the minimum age of objects created at this line is 0.
- 5th column means the maximum age of objects created at this line is 0.
- 6th column means that total memory consumption without RVALUE.
1st column (creation number) is important and enough for this situation.
Line 31 making an array by map
method so 1004 objects are created at this line. It is reasonable.
We need to check line 20. This line creats 2546 objects. This line is "a[i], a[i+gap] = a[i+gap], a[i]". What happen on this line?
To know more details, we can use allocation tracer gem like this:
ObjectSpace::AllocationTracer.setup(%i{path line type})
pp ObjectSpace::AllocationTracer.trace{
...
}
We can collect statistics for each types (types are MRI specific types). A result is:
{["../../trunk/test.rb", 31, :T_DATA]=>[1, 0, 0, 0, 0, 0],
["../../trunk/test.rb", 31, :T_ARRAY]=>[1, 0, 0, 0, 0, 0],
["../../trunk/test.rb", 31, :T_NODE]=>[2, 0, 0, 0, 0, 0],
["../../trunk/test.rb", 31, :T_STRING]=>[1000, 0, 0, 0, 0, 0],
["../../trunk/test.rb", 17, :T_STRUCT]=>[24, 0, 0, 0, 0, 0],
["../../trunk/test.rb", 20, :T_ARRAY]=>[2546, 0, 0, 0, 0, 0],
["../../trunk/test.rb", 33, :T_STRUCT]=>[1, 0, 0, 0, 0, 0]}
Using this technique, we can understand T_ARRAY, that is to say array objects are created. Whay arrays?
Actually, ruby's multiple assignment (parallel assignment) returns an array. Maybe you don't know about that. So let's see the example.
x = (a, b = 1, 2)
p x #=> [1, 2]
A variable x
has a reference to the result of an expression (a, b = 1, 2)
. And it is [1, 2]
. We can check it with allocation tracer gem.
require 'allocation_tracer'
require 'pp'
ObjectSpace::AllocationTracer.setup(%i{path line type})
pp ObjectSpace::AllocationTracer.trace{
a, b = 1, 2
}
and
{["../../trunk/test.rb", 5, :T_ARRAY]=>[1, 0, 0, 0, 0, 0]}
Okay, line 5, or a, b = 1, 2
creates an array.
You want to say that you don't want to use created array object [1, 2]
. In past I also had thought same thing, we optimize it. If it is clear that we don't need to use an array object, then we elimite to create an array object.
Let's try.
require 'allocation_tracer'
require 'pp'
ObjectSpace::AllocationTracer.setup(%i{path line type})
pp ObjectSpace::AllocationTracer.trace{
a, b = 1, 2
nil
}
Result:
{}
Result shows that no objects are created.
Original program returns a result value of multiple assignment as a block result value. But we can specify nil
for block result value, and we can eliminate creating array objects.
Okay, let's insert nil
and try it again.
##############################
# scomb.rb
def newGap gap
gap /= 1.3;
gap = gap.to_i;
return 11 if gap == 9 or gap == 10
return 1 if gap < 1
gap
end
def combSort a
len = a.length
gap = len
swapped = false
begin
swapped = false
gap = newGap gap
(0...(len-gap)).each do |i|
if a[i] > a[i+gap]
swapped = true
a[i], a[i+gap] = a[i+gap], a[i]
nil # Insert nil here
end
end
end while gap > 1 or swapped
end
require 'allocation_tracer'
require 'pp'
ObjectSpace::AllocationTracer.setup(%i{path line type})
pp ObjectSpace::AllocationTracer.trace{
N = 1000#0000;
arr = N.times.map{|x| (N-x).to_s }
combSort arr
(1...N).each{ |z| print '!' if arr[z]<arr[z-1] }
}
But...
{["../../trunk/test.rb", 32, :T_DATA]=>[1, 0, 0, 0, 0, 0],
["../../trunk/test.rb", 32, :T_ARRAY]=>[1, 0, 0, 0, 0, 0],
["../../trunk/test.rb", 32, :T_NODE]=>[2, 0, 0, 0, 0, 0],
["../../trunk/test.rb", 32, :T_STRING]=>[1000, 0, 0, 0, 0, 0],
["../../trunk/test.rb", 17, :T_STRUCT]=>[24, 0, 0, 0, 0, 0],
["../../trunk/test.rb", 20, :T_ARRAY]=>[2546, 0, 0, 0, 0, 0], <-- here
["../../trunk/test.rb", 34, :T_STRUCT]=>[1, 0, 0, 0, 0, 0]}
Array objects are created. Why?
In fact, multiple assignments includes method calling like a[x], a[y] = 1, 2
create an array naively. This is because method invocation can cause side-effect and we need to sort execution order correctly. For example, this program should invoke a[x] = 1; a[y] = 2
one by one.
In reverse, let's consider the cases we don't need to care about an execution order.
a, b = 1, 2
In this case, Ruby's stack machine VM exceute like that.
push 1
push 2
set b # b = 2
set a # a = 1
In this case, an assignment order is b
and a
. However, it is possible that a[x], a[y]
has a side effect, a logging for example. So we can't do same thing like:
push 1
push 2
set a[y] # a[y] = 2
set a[x] # a[x] = 1
Because of this reason, we make an intermediate array.
push 1
push 2
newarray 2 # Make [1, 2]
expandarray 2 # Expand [1, 2] reversely.
set a[x] # a[x] = 1
set a[y] # a[y] = 2
We can complete multiple assignment.
Now, we understand everything. Let's rewrite multiple assignment like that:
##############################
# scomb.rb
def newGap gap
gap /= 1.3;
gap = gap.to_i;
return 11 if gap == 9 or gap == 10
return 1 if gap < 1
gap
end
def combSort a
len = a.length
gap = len
swapped = false
begin
swapped = false
gap = newGap gap
(0...(len-gap)).each do |i|
if a[i] > a[i+gap]
swapped = true
x, y = a[i+gap], a[i] # Rewrite here, not cool way
a[i] = x
a[i+gap] = y
nil
end
end
end while gap > 1 or swapped
end
require 'allocation_tracer'
require 'pp'
ObjectSpace::AllocationTracer.setup(%i{path line type})
pp ObjectSpace::AllocationTracer.trace{
N = 1000#0000;
arr = N.times.map{|x| (N-x).to_s }
combSort arr
(1...N).each{ |z| print '!' if arr[z]<arr[z-1] }
}
and result is:
{["../../trunk/test.rb", 34, :T_DATA]=>[1, 0, 0, 0, 0, 0],
["../../trunk/test.rb", 34, :T_ARRAY]=>[1, 0, 0, 0, 0, 0],
["../../trunk/test.rb", 34, :T_NODE]=>[2, 0, 0, 0, 0, 0],
["../../trunk/test.rb", 34, :T_STRING]=>[1000, 0, 0, 0, 0, 0],
["../../trunk/test.rb", 17, :T_STRUCT]=>[24, 0, 0, 0, 0, 0],
["../../trunk/test.rb", 36, :T_STRUCT]=>[1, 0, 0, 0, 0, 0]}
Congraturations, we don't make an array object any more at line 20.
Let's try with 10M entries.
$ /usr/bin/time -f "\nCPU: %Us\tReal: %es\tRAM: %MKB" ./miniruby -I../../trunk/lib -I. -I.ext/common ../../trunk/test.rb
CPU: 102.89s Real: 103.14s RAM: 527712KB
Congraturations, it is estimated 500MB.
Based on this analysis, Nobu made a patch to tune this behavior in [r49614] (http://svn.ruby-lang.org/cgi-bin/viewvc.cgi?view=revision&revision=49614).
For a[0], a[1] = x, y
, the compiled byte code from:
push x # stack: x
push y # stack: x y
newarray 2 # stack: [x, y]
expandarray 2, 0 # stack: y, x
set a[0] # stack: y # a[0] = x
set a[1] # stack: # a[1] = y
to
push x # stack: x
push y # stack: x y
swap # stack: y x
set a[0] # stack: y # a[0] = x
set a[1] # stack: # a[1] = y
It is only for two variables case, but most freqent case. We have an idea to introduce reverse
instruction for over two variables, but it is not frequent case. So that we only introduce this technique using an exsiting swap
instruction.
Btw, not balanced cases aren't supported, such as a, b, c = x, y
. They are exceptions.
Let's try on trunk.
##############################
# scomb.rb
def newGap gap
gap /= 1.3;
gap = gap.to_i;
return 11 if gap == 9 or gap == 10
return 1 if gap < 1
gap
end
def combSort a
len = a.length
gap = len
swapped = false
begin
swapped = false
gap = newGap gap
(0...(len-gap)).each do |i|
if a[i] > a[i+gap]
swapped = true
a[i], a[i+gap] = a[i+gap], a[i]
nil # there is still nil
end
end
end while gap > 1 or swapped
end
N = 10000000;
arr = N.times.map{|x| (N-x).to_s }
combSort arr
(1...N).each{ |z| print '!' if arr[z]<arr[z-1] }
and
CPU: 103.15s Real: 103.30s RAM: 527492KB
Congratulations, we can optimize completely.
nil
is not cool. How about to use GC tuning again?
We can estimate that required objects (old objects) are 10M + alpha objects. Let's try with RUBY_GC_HEAP_GROWTH_FACTOR=1.01 to restrict heap extension. However, using this parameter, we need huge overhead for initial code (making a 10M entries array). So that prepare 10M+alpha slots, in this case, 12M slots at initial time using RUBY_GC_HEAP_INIT_SLOTS=12000000
.
Let's try:
$ RUBY_GC_HEAP_INIT_SLOTS=12000000 RUBY_GC_HEAP_GROWTH_FACTOR=1.01 /usr/bin/time -f "\nCPU: %Us\tReal: %es\tRAM: %MKB" ./miniruby -I../../trunk/lib -I. -I.ext/common ../../trunk/test.rb
CPU: 110.43s Real: 110.65s RAM: 670824KB
It is better than last trial "RAM: 745388KB".
However, we prepare 12M slots at first, it should be enough but 740MB is higher than estimated (12M * 40B = 480MB: slots) + (80MB: array) = 560MB. What happens?
In this case, we can use [gc_tracer] (https://github.com/ko1/gc_tracer) gem.
At the beggining of file, you only need to write:
require 'gc_tracer'
GC::Tracer.start_logging
and program outputs GC internal information at every events, like that.
type tick count heap_allocated_pages heap_sorted_length heap_allocatable_pages heap_available_slots heap_live_slots heap_free_slots heap_final_slots heap_marked_slots heap_swept_slots heap_eden_pages heap_tomb_pages total_allocated_pages total_freed_pages total_allocated_objects total_freed_objects malloc_increase_bytes malloc_increase_bytes_limit minor_gc_count major_gc_count remembered_wb_unprotected_objects remembered_wb_unprotected_objects_limit old_objects old_objects_limit oldmalloc_increase_bytes oldmalloc_increase_bytes_limit major_by gc_by have_finalizer immediate_sweep state
start 1424126502809096 1 29411 29411 0 11988251 11986265 1986 0 0 0 29411 0 29411 0 11988251 1986 0 33554432 0 0 0 0 0 0 0 16777216 0 newobj 0 0 none
end_mark 1424126503057311 1 29411 29411 0 11988251 11986265 1986 0 10008677 0 29411 0 29411 0 11988251 1986 0 33554432 1 0 164 0 6099 0 0 16777216 0newobj 0 0 marking
end_sweep 1424126503416026 1 29411 29411 2859 11988251 10415829 1572422 0 10008677 1979573 26552 2859 29411 0 12395402 1979573 0 33554432 1 0 164 0 6099 0 0 16777216 0newobj 0 0 sweeping
start 1424126503416614 2 29411 29411 2859 11988251 10415829 1572422 0 10008677 1979573 26552 2859 29411 0 12395402 1979573 0 32883343 1 0 164 0 6099 0 0 16777216 oldgen newobj 0 0 none
end_mark 1424126505920376 2 29412 29705 293 11988659 11988228 431 0 10011540 1979573 29412 0 29412 0 13967801 1979573 0 32883343 1 1 164 328 6099 12198 0 16777216 oldgen newobj 0 0 marking
end_sweep 1424126509653076 2 29704 29705 1426 12107679 11526419 581260 0 10011540 2096662 28278 1426 29704 0 15482679 3956260 0 32883343 1 1 164 328 6099 12198 0 16777216 oldgen newobj 0 0 sweeping
start 1424126510720439 3 29704 29705 0 12107679 12107679 0 0 10011540 2096662 29704 0 29704 0 16063939 3956260 0 32225676 1 1 164 328 6099 12198 0 16777216 0 newobj 0 0 none
end_mark 1424126511016490 3 29705 30001 296 12108087 12107679 408 0 10008677 2096662 29705 0 29705 0 16063939 3956260 0 32225676 2 1 173 328 10008482 12198 0 16777216 0 newobj 0 0 marking
end_sweep 1424126511495631 3 29705 30001 3035 12108087 10439327 1668760 0 10008677 2220177 26670 3035 29705 0 16494588 6055261 0 32225676 2 1 173 328 10008482 12198 0 16777216 0 newobj 0 0 sweeping
start 1424126511496421 4 29705 30001 3035 12108087 10439327 1668760 0 10008677 2220177 26670 3035 29705 0 16494588 6055261 0 31581162 2 1 173 328 10008482 12198 0 16777216 oldgen newobj 0 0 none
end_mark 1424126515062702 4 29706 30002 296 12108494 12108087 407 0 10011715 2220177 29706 0 29706 0 18163348 6055261 0 31581162 2 2 173 346 10008483 20016966 0 16777216 oldgen newobj 0 0 marking
end_sweep 1424126517959578 4 30001 30002 1514 12228736 11611552 617184 0 10011715 2217546 28487 1514 30001 0 19763184 8151632 0 31581162 2 2 173 346 10008483 20016966 0 16777216 oldgen newobj 0 0 sweeping
start 1424126518937223 5 30001 30002 0 12228736 12228736 0 0 10011715 2217546 30001 0 30001 0 20380368 8151632 0 30949538 2 2 173 346 10008483 20016966 0 16777216 0newobj 0 0 none
end_mark 1424126518939379 5 30002 30301 299 12229144 12228736 408 0 10008677 2217546 30002 0 30002 0 20380368 8151632 0 30949538 3 2 173 346 10008483 20016966 0 16777216 0 newobj 0 0 marking
end_sweep 1424126520330323 5 30002 30301 3714 12229144 10715191 1513953 0 10008677 2342458 26288 3714 30002 0 21086881 10371690 0 30949538 3 2 173 346 10008483 20016966 016777216 0 newobj 0 0 sweeping
start 1424126522730700 6 30002 30301 0 12229144 12229144 0 0 10008677 2342458 30002 0 30002 0 22600834 10371690 0 30330547 3 2 173 346 10008483 20016966 0 16777216 0 newobj 0 0 none
end_mark 1424126522733421 6 30003 30302 299 12229552 12229144 408 0 10008677 2342458 30003 0 30003 0 22600834 10371690 0 30330547 4 2 173 346 10008483 20016966 016777216 0 newobj 0 0 marking
end_sweep 1424126523700184 6 30003 30302 3715 12229552 10715222 1514330 0 10008677 2342866 26288 3715 30003 0 23307378 12592156 0 30330547 4 2 173 346 10008483 20016966 016777216 0 newobj 0 0 sweeping
start 1424126526054966 7 30003 30302 0 12229552 12229552 0 0 10008677 2342866 30003 0 30003 0 24821708 12592156 0 29723936 4 2 173 346 10008483 20016966 0 16777216 0 newobj 0 0 none
end_mark 1424126526057021 7 30004 30303 299 12229959 12229552 407 0 10008677 2342866 30004 0 30004 0 24821708 12592156 0 29723936 5 2 173 346 10008483 20016966 016777216 0 newobj 0 0 marking
end_sweep 1424126527206651 7 30004 30303 3715 12229959 10715610 1514349 0 10008677 2343273 26289 3715 30004 0 25528640 14813030 0 29723936 5 2 173 346 10008483 20016966 016777216 0 newobj 0 0 sweeping
start 1424126529599621 8 30004 30303 0 12229959 12229959 0 0 10008677 2343273 30004 0 30004 0 27042989 14813030 0 29129457 5 2 173 346 10008483 20016966 0 16777216 0 newobj 0 0 none
end_mark 1424126529601939 8 30004 30303 0 12229959 12229959 0 0 10008677 2343273 30004 0 30004 0 27042989 14813030 0 29129457 6 2 173 346 10008483 20016966 016777216 0 newobj 0 0 marking
end_sweep 1424126530234409 8 30004 30303 3213 12229959 10464694 1765265 0 10008677 2221281 26791 3213 30004 0 27499005 17034311 0 29129457 6 2 173 346 10008483 20016966 016777216 0 newobj 0 0 sweeping
start 1424126530235030 9 30004 30303 3213 12229959 10464768 1765191 0 10008677 2221281 26791 3213 30004 0 27499079 17034311 0 28546867 6 2 173 346 10008483 20016966 0 16777216 nofree newobj 0 0 none
end_mark 1424126533329440 9 30005 30304 299 12230366 12229959 407 0 10011894 2221281 30005 0 30005 0 29264270 17034311 0 28546867 6 3 173 346 10008483 20016966 016777216 nofree newobj 0 0 marking
end_sweep 1424126535800255 9 30303 30304 1604 12351831 11697946 653885 0 10011894 2340463 28699 1604 30303 0 30950321 19252375 0 28546867 6 3 173 346 10008483 20016966 016777216 nofree newobj 0 0 sweeping
start 1424126536666685 10 30303 30304 0 12351831 12351831 0 0 10011894 2340463 30303 0 30303 0 31604206 19252375 0 27975929 6 3 173 346 10008483 20016966 0 16777216 0 newobj 0 0 none
end_mark 1424126536668752 10 30304 30606 302 12352239 12351831 408 0 10008677 2340463 30304 0 30304 0 31604206 19252375 0 27975929 7 3 173 346 10008483 20016966 016777216 0 newobj 0 0 marking
end_sweep 1424126537715101 10 30304 30606 3938 12352239 10746976 1605263 0 10008677 2466777 26366 3938 30304 0 32342504 21595528 0 27975929 7 3 173 346 10008483 20016966 016777216 0 newobj 0 0 sweeping
start 1424126539759913 11 30304 30606 0 12352239 12352239 0 0 10008677 2466777 30304 0 30304 0 33947767 21595528 0 27416410 7 3 173 346 10008483 20016966 0 16777216 0 newobj 0 0 none
end_mark 1424126539762060 11 30305 30607 302 12352646 12352239 407 0 10008677 2466777 30305 0 30305 0 33947767 21595528 0 27416410 8 3 173 346 10008483 20016966 016777216 0 newobj 0 0 marking
end_sweep 1424126540779275 11 30305 30607 3938 12352646 10747410 1605236 0 10008677 2467184 26367 3938 30305 0 34686499 23939089 0 27416410 8 3 173 346 10008483 20016966 016777216 0 newobj 0 0 sweeping
start 1424126542891783 12 30305 30607 0 12352646 12352646 0 0 10008677 2467184 30305 0 30305 0 36291735 23939089 0 26868081 8 3 173 346 10008483 20016966 0 16777216 0 newobj 0 0 none
end_mark 1424126542894134 12 30306 30608 302 12353054 12352646 408 0 10008677 2467184 30306 0 30306 0 36291735 23939089 0 26868081 9 3 173 346 10008483 20016966 016777216 0 newobj 0 0 marking
end_sweep 1424126543915024 12 30306 30608 3938 12353054 10747832 1605222 0 10008677 2467592 26368 3938 30306 0 37030889 26283057 0 26868081 9 3 173 346 10008483 20016966 016777216 0 newobj 0 0 sweeping
start 1424126545944224 13 30306 30608 0 12353054 12353054 0 0 10008677 2467592 30306 0 30306 0 38636111 26283057 0 26330719 9 3 173 346 10008483 20016966 0 16777216 0 newobj 0 0 none
end_mark 1424126545946459 13 30306 30608 0 12353054 12353054 0 0 10008677 2467592 30306 0 30306 0 38636111 26283057 0 26330719 10 3 173 346 10008483 20016966 016777216 0 newobj 0 0 marking
end_sweep 1424126546575020 13 30306 30608 3393 12353054 10489565 1863489 0 10008677 2344376 26913 3393 30306 0 39116998 28627433 0 26330719 10 3 173 346 10008483 20016966 016777216 0 newobj 0 0 sweeping
start 1424126546575643 14 30306 30608 3393 12353054 10489639 1863415 0 10008677 2344376 26913 3393 30306 0 39117072 28627433 0 25804104 10 3 173 346 10008483 20016966 0 16777216 nofree newobj 0 0 none
end_mark 1424126549474776 14 30307 30609 302 12353462 12353054 408 0 10012072 2344376 30307 0 30307 0 40980487 28627433 0 25804104 10 4 173 346 10008483 20016966 016777216 nofree newobj 0 0 marking
end_sweep 1424126551736504 14 30608 30609 1696 12476150 11784748 691402 0 10012072 2464605 28912 1696 30608 0 42753162 30968414 0 25804104 10 4 173 346 10008483 20016966 016777216 nofree newobj 0 0 sweeping
start 1424126552572440 15 30608 30609 0 12476150 12476150 0 0 10012072 2464605 30608 0 30608 0 43444564 30968414 0 25288021 10 4 173 346 10008483 20016966 0 16777216 0 newobj 0 0 none
end_mark 1424126552574628 15 30609 30914 305 12476557 12476150 407 0 10008677 2464605 30609 0 30609 0 43444564 30968414 0 25288021 11 4 173 346 10008483 20016966 016777216 0 newobj 0 0 marking
end_sweep 1424126553574614 15 30609 30914 4163 12476557 10779627 1696930 0 10008677 2592319 26446 4163 30609 0 44215513 33435886 0 25288021 11 4 173 346 10008483 20016966 016777216 0 newobj 0 0 sweeping
start 1424126555693333 16 30609 30914 0 12476557 12476557 0 0 10008677 2592319 30609 0 30609 0 45912443 33435886 0 24782260 11 4 173 346 10008483 20016966 0 16777216 0 newobj 0 0 none
end_mark 1424126555695618 16 30610 30915 305 12476965 12476557 408 0 10008677 2592319 30610 0 30610 0 45912443 33435886 0 24782260 12 4 173 346 10008483 20016966 016777216 0 newobj 0 0 marking
end_sweep 1424126556637992 16 30610 30915 4164 12476965 10779632 1697333 0 10008677 2592727 26446 4164 30610 0 46683397 35903765 0 24782260 12 4 173 346 10008483 20016966 016777216 0 newobj 0 0 sweeping
start 1424126558691606 17 30610 30915 0 12476965 12476965 0 0 10008677 2592727 30610 0 30610 0 48380730 35903765 0 24286614 12 4 173 346 10008483 20016966 0 16777216 0 newobj 0 0 none
end_mark 1424126558693863 17 30611 30916 305 12477372 12476965 407 0 10008677 2592727 30611 0 30611 0 48380730 35903765 0 24286614 13 4 173 346 10008483 20016966 016777216 0 newobj 0 0 marking
end_sweep 1424126559661498 17 30611 30916 4165 12477372 10779603 1697769 0 10008677 2593134 26446 4165 30611 0 49151655 38372052 0 24286614 13 4 173 346 10008483 20016966 016777216 0 newobj 0 0 sweeping
start 1424126561720964 18 30611 30916 0 12477372 12477372 0 0 10008677 2593134 30611 0 30611 0 50849424 38372052 0 23800881 13 4 173 346 10008483 20016966 0 16777216 0 newobj 0 0 none
end_mark 1424126561723441 18 30611 30916 0 12477372 12477372 0 0 10008677 2593134 30611 0 30611 0 50849424 38372052 0 23800881 14 4 173 346 10008483 20016966 016777216 0 newobj 0 0 marking
end_sweep 1424126562369895 18 30611 30916 3576 12477372 10514399 1962973 0 10008677 2468694 27035 3576 30611 0 51355145 40840746 0 23800881 14 4 173 346 10008483 20016966 016777216 0 newobj 0 0 sweeping
start 1424126562370577 19 30611 30916 3576 12477372 10514473 1962899 0 10008677 2468694 27035 3576 30611 0 51355219 40840746 0 23324863 14 4 173 346 10008483 20016966 0 16777216 nofree newobj 0 0 none
end_mark 1424126565561276 19 30612 30917 305 12477780 12477372 408 0 10012256 2468694 30612 0 30612 0 53318118 40840746 0 23324863 14 5 173 346 10008483 20016966 016777216 nofree newobj 0 0 marking
end_sweep 1424126567940458 19 30916 30917 1787 12601690 11873250 728440 0 10012256 2589963 29129 1787 30916 0 55179111 43305861 0 23324863 14 5 173 346 10008483 20016966 016777216 nofree newobj 0 0 sweeping
start 1424126568805749 20 30916 30917 0 12601690 12601690 0 0 10012256 2589963 30916 0 30916 0 55907551 43305861 0 22858365 14 5 173 346 10008483 20016966 0 16777216 0 newobj 0 0 none
end_mark 1424126568808097 20 30917 31225 308 12602098 12601690 408 0 10008677 2589963 30917 0 30917 0 55907551 43305861 0 22858365 15 5 173 346 10008483 20016966 016777216 0 newobj 0 0 marking
end_sweep 1424126569823157 20 30917 31225 4391 12602098 10812225 1789873 0 10008677 2719084 26526 4391 30917 0 56711098 45898873 0 22858365 15 5 173 346 10008483 20016966 016777216 0 newobj 0 0 sweeping
start 1424126571993522 21 30917 31225 0 12602098 12602098 0 0 10008677 2719084 30917 0 30917 0 58500971 45898873 0 22401197 15 5 173 346 10008483 20016966 0 16777216 0 newobj 0 0 none
end_mark 1424126571995939 21 30918 31226 308 12602506 12602098 408 0 10008677 2719084 30918 0 30918 0 58500971 45898873 0 22401197 16 5 173 346 10008483 20016966 016777216 0 newobj 0 0 marking
end_sweep 1424126572995120 21 30918 31226 4391 12602506 10812573 1789933 0 10008677 2719492 26527 4391 30918 0 59304866 48492293 0 22401197 16 5 173 346 10008483 20016966 016777216 0 newobj 0 0 sweeping
start 1424126575233588 22 30918 31226 0 12602506 12602506 0 0 10008677 2719492 30918 0 30918 0 61094799 48492293 0 21953173 16 5 173 346 10008483 20016966 0 16777216 0 newobj 0 0 none
end_mark 1424126575236602 22 30919 31227 308 12602913 12602506 407 0 10008677 2719492 30919 0 30919 0 61094799 48492293 0 21953173 17 5 173 346 10008483 20016966 016777216 0 newobj 0 0 marking
end_sweep 1424126576237601 22 30919 31227 4393 12602913 10812191 1790722 0 10008677 2719899 26526 4393 30919 0 61898312 51086121 0 21953173 17 5 173 346 10008483 20016966 016777216 0 newobj 0 0 sweeping
start 1424126578559341 23 30919 31227 0 12602913 12602913 0 0 10008677 2719899 30919 0 30919 0 63689034 51086121 0 21514109 17 5 173 346 10008483 20016966 0 16777216 0 newobj 0 0 none
end_mark 1424126578561550 23 30919 31227 0 12602913 12602913 0 0 10008677 2719899 30919 0 30919 0 63689034 51086121 0 21514109 18 5 173 346 10008483 20016966 016777216 0 newobj 0 0 marking
end_sweep 1424126579347093 23 30919 31227 3762 12602913 10539261 2063652 0 10008677 2594235 27157 3762 30919 0 64219617 53680356 0 21514109 18 5 173 346 10008483 20016966 016777216 0 newobj 0 0 sweeping
start 1424126579347996 24 30919 31227 3762 12602913 10539335 2063578 0 10008677 2594235 27157 3762 30919 0 64219691 53680356 0 21083826 18 5 173 346 10008483 20016966 0 16777216 nofree newobj 0 0 none
end_mark 1424126582822956 24 30920 31228 308 12603320 12602913 407 0 10012442 2594235 30920 0 30920 0 66283269 53680356 0 21083826 18 6 173 346 10008483 20016966 016777216 nofree newobj 0 0 marking
end_sweep 1424126585323074 24 31227 31228 1878 12728454 11962886 765568 0 10012442 2716541 29349 1878 31227 0 68233712 56270826 0 21083826 18 6 173 346 10008483 20016966 016777216 nofree newobj 0 0 sweeping
start 1424126586349636 25 31227 31228 0 12728454 12728454 0 0 10012442 2716541 31227 0 31227 0 68999280 56270826 0 20662149 18 6 173 346 10008483 20016966 0 16777216 0 newobj 0 0 none
end_mark 1424126586351756 25 31228 31539 311 12728861 12728454 407 0 10008677 2716541 31228 0 31228 0 68999280 56270826 0 20662149 19 6 173 346 10008483 20016966 016777216 0 newobj 0 0 marking
end_sweep 1424126587374898 25 31228 31539 4621 12728861 10845248 1883613 0 10008677 2847071 26607 4621 31228 0 69835850 58990602 0 20662149 19 6 173 346 10008483 20016966 016777216 0 newobj 0 0 sweeping
start 1424126589528536 26 31228 31539 0 12728861 12728861 0 0 10008677 2847071 31228 0 31228 0 71719463 58990602 0 20248906 19 6 173 346 10008483 20016966 0 16777216 0 newobj 0 0 none
end_mark 1424126589530688 26 31229 31540 311 12729269 12728861 408 0 10008677 2847071 31229 0 31229 0 71719463 58990602 0 20248906 20 6 173 346 10008483 20016966 016777216 0 newobj 0 0 marking
end_sweep 1424126590517575 26 31229 31540 4622 12729269 10845240 1884029 0 10008677 2847479 26607 4622 31229 0 72556025 61710785 0 20248906 20 6 173 346 10008483 20016966 016777216 0 newobj 0 0 sweeping
start 1424126592712832 27 31229 31540 0 12729269 12729269 0 0 10008677 2847479 31229 0 31229 0 74440054 61710785 0 19843927 20 6 173 346 10008483 20016966 0 16777216 0 newobj 0 0 none
end_mark 1424126592715386 27 31230 31541 311 12729677 12729269 408 0 10008677 2847479 31230 0 31230 0 74440054 61710785 0 19843927 21 6 173 346 10008483 20016966 016777216 0 newobj 0 0 marking
end_sweep 1424126593675368 27 31230 31541 4623 12729677 10845232 1884445 0 10008677 2847887 26607 4623 31230 0 75276608 64431376 0 19843927 21 6 173 346 10008483 20016966 016777216 0 newobj 0 0 sweeping
start 1424126595912772 28 31230 31541 0 12729677 12729677 0 0 10008677 2847887 31230 0 31230 0 77161053 64431376 0 19447048 21 6 173 346 10008483 20016966 0 16777216 0 newobj 0 0 none
end_mark 1424126595915096 28 31230 31541 0 12729677 12729677 0 0 10008677 2847887 31230 0 31230 0 77161053 64431376 0 19447048 22 6 173 346 10008483 20016966 016777216 0 newobj 0 0 marking
end_sweep 1424126596550020 28 31230 31541 3948 12729677 10564079 2165598 0 10008677 2720999 27282 3948 31230 0 77716454 67152375 0 19447048 22 6 173 346 10008483 20016966 016777216 0 newobj 0 0 sweeping
start 1424126596550695 29 31230 31541 3948 12729677 10564079 2165598 0 10008677 2720999 27282 3948 31230 0 77716454 67152375 0 19058107 22 6 173 346 10008483 20016966 0 16777216 nofree newobj 0 0 none
end_mark 1424126599572099 29 31231 31542 311 12730084 12729677 407 0 10012627 2720999 31231 0 31231 0 79882052 67152375 0 19058107 22 7 173 346 10008483 20016966 016777216 nofree newobj 0 0 marking
end_sweep 1424126601548503 29 31541 31542 1973 12856442 12052174 804268 0 10012627 2844344 29568 1973 31541 0 81921598 69869424 0 19058107 22 7 173 346 10008483 20016966 016777216 nofree newobj 0 0 sweeping
start 1424126602263195 30 31541 31542 0 12856442 12856442 0 0 10012627 2844344 31541 0 31541 0 82725866 69869424 0 18676944 22 7 173 346 10008483 20016966 0 16777216 0 newobj 0 0 none
end_mark 1424126602265341 30 31542 31856 314 12856849 12856442 407 0 10008677 2844344 31542 0 31542 0 82725866 69869424 0 18676944 23 7 173 346 10008483 20016966 016777216 0 newobj 0 0 marking
end_sweep 1424126603175460 30 31542 31856 4854 12856849 10878266 1978583 0 10008677 2976283 26688 4854 31542 0 83595454 72717188 0 18676944 23 7 173 346 10008483 20016966 016777216 0 newobj 0 0 sweeping
start 1424126605113748 31 31542 31856 0 12856849 12856849 0 0 10008677 2976283 31542 0 31542 0 85574037 72717188 0 18303405 23 7 173 346 10008483 20016966 0 16777216 0 newobj 0 0 none
end_mark 1424126605115927 31 31543 31857 314 12857257 12856849 408 0 10008677 2976283 31543 0 31543 0 85574037 72717188 0 18303405 24 7 173 346 10008483 20016966 016777216 0 newobj 0 0 marking
end_sweep 1424126605955606 31 31543 31857 4855 12857257 10878238 1979019 0 10008677 2976691 26688 4855 31543 0 86443597 75565359 0 18303405 24 7 173 346 10008483 20016966 016777216 0 newobj 0 0 sweeping
start 1424126607549599 32 31543 31857 0 12857257 12857257 0 0 10008677 2976691 31543 0 31543 0 88422616 75565359 0 17937336 24 7 173 346 10008483 20016966 0 16777216 0 newobj 0 0 none
end_mark 1424126607551824 32 31544 31858 314 12857664 12857257 407 0 10008677 2976691 31544 0 31544 0 88422616 75565359 0 17937336 25 7 173 346 10008483 20016966 016777216 0 newobj 0 0 marking
CPU: 114.08s Real: 114.44s RAM: 672072KB
It is hard to read. So I uploaded google apps sheet (https://docs.google.com/spreadsheets/d/1lVbgPWdumC19UMm-n_VxcsRUR9RR6G_E0gVTBxscvzs/edit?usp=sharing).
Introducing first colums:
- type Event type
- tick Phisical time
- count GC count
Event types means:
- start Beggining of GC
- end_mark End of making for this GC
- end_sweep End of sweeping for this GC == End of GC
Btw, there are other events.
Let's see heap_marked_slots
numbers. This numbers show how many objects are marked, in other words, numbers of living objects. They are stable for 10.0M. However, the numbers of heap_available_slots
, this numbers show how many slots are available, are increasing. Why?
Actually, Ruby's GC extends heap if there are no enough free slots. "Enough free slots" is available slots * 0.3. Let's check with real numbers.
- available: 11988251
- marked: 10008677
- free slots: 11988251 - 10008677 = 1979574
- free slot ratio: 1979574/11988251 = 0.16...
So it is less than 0.3. So that Ruby's GC decide to extend heap. Extending by RUBY_GC_HEAP_GROWTH_FACTOR=1.01
, so that available slots are increased from 11988659 to 12107679 for one GC. It is x1.009, almost x1.01.
To prevent heap extension, we need to prepare available slots * 0.7 > marked objects => available slots > marked objects / 0.7 = marked objects * 1.43. Marked objects are stable with 10.1M objects. So we need to prepare, x1.5 (a bit bigger than x1.43) = 15.15M.
Let's try.
$ RUBY_GC_HEAP_INIT_SLOTS=15150000 /usr/bin/time -f "\nCPU: %Us\tReal: %es\tRAM: %MKB" ruby ../../trunk/test.rb...
CPU: 132.72s Real: 133.03s RAM: 770084KB
Checking GC trace result, there are no extension for heap. Congrats!
Wait. 770MB seems too big.
- Available slots: 15.15M * 40B = 606MB
- 10M Array = 10M * 8B = 80MB
- Total: 686MB.
Where remaining 80MB comes from?
Let's check using [valgrind] (http://valgrind.org/). It is very powerful tool to understand the behavior of program. There is a tool named massif to check memory consumption. Let's use it.
Using valgrind/massif (and using my own esoteric script), we can get the following figure. Btw, current ruby doesn't run on valgrind without a trick.
[Fig. result of massif] (http://www.atdot.net/fp_store/f.h4owjn/file.copipa-temp-image.png)
X-axis shows time and Y-axis shows memory consumption (stacking line chart).
Analysis:
- [email protected] consumes 600 MB
- [email protected] consumes 80 MB
- [email protected] consumes 80 MB in maximum, 50 MB in average
We need to check maximum value. Bingo! 600MB + 80MB + 80MB = 760MB, it is almost equals to our previous evaluation.
- 600MB consumed by [email protected] is 15M RVALUEs (40B) = 15M * 40B = 600MB.
- 80MB consumed by [email protected] are used an 10M entry array, of course.
What is last "[email protected] consumes 80 MB in maximum, 50 MB in average"?
Using my own script, we can get more.
[Fig. result of massif / details] (http://www.atdot.net/fp_store/f.foowjn/file.copipa-temp-image.png)
We can find that the function stack_chunk_alloc()
consumes 50MB to 80MB. This function is allocator function for marking stack to process marking phase. It is surprising result.
The following pseudo code shows how marking stack used.
def mark_all
mark_stack = [root_set...]
while obj = mark_stack.pop
mark(obj)
obj.reachable_object{|refered_by_obj|
mark_stack.push refered_by_obj unless refered_by_obj.marked?
}
end
end
In this case, we need to mark an array object contains 10M entries. In this time, 10M entries (strings) are pushed onto the marking stack. Marking stack size will be 8B * 10M = 80MB. It is measured value.
We can solve all of mysteries.
This is demostration how to survey the behavior of Ruby programs. You can use several tools. If you have a trouble, please consider to use them. If you ask us, please give us information, at least, gc_tracer log.
Thanks, Koichi