Skip to content

Instantly share code, notes, and snippets.

@SafeAF
Created January 16, 2016 13:50
Show Gist options
  • Save SafeAF/3d4d51f41ab63542a5c3 to your computer and use it in GitHub Desktop.
Save SafeAF/3d4d51f41ab63542a5c3 to your computer and use it in GitHub Desktop.
xport RUBY_GC_HEAP_INIT_SLOTS=600000
export RUBY_GC_HEAP_FREE_SLOTS=600000
export RUBY_GC_HEAP_GROWTH_FACTOR=1.25
export RUBY_GC_HEAP_GROWTH_MAX_SLOTS=300000
or this
# Boost Ruby
export RUBY_GC_HEAP_INIT_SLOTS=1000000 # 1M
export RUBY_GC_HEAP_FREE_SLOTS=500000 # 0.5M
export RUBY_GC_HEAP_GROWTH_FACTOR=1.1
export RUBY_GC_HEAP_GROWTH_MAX_SLOTS=10000000 # 10M
export RUBY_GC_MALLOC_LIMIT_MAX=1000000000 # 1G
export RUBY_GC_MALLOC_LIMIT_GROWTH_FACTOR=1.1
# export RUBY_GC_OLDMALLOC_LIMIT=500000000 # 500M
# export RUBY_GC_OLDMALLOC_LIMIT_MAX=1000000000 # 1G
# export RUBY_GC_OLDMALLOC_LIMIT_GROWTH_FACTOR=1.1
Performance MRI Ruby Patch
$ rvm install 2.1.2 --patch railsexpress -n railsexpress
$ rvm --default use 2.1.2-railsexpress
Run this on Ruby 2.1.1 and you will be out of memory soon:
while true
"a" * (1024 ** 2)
end
Malloc limits, Ruby and you
From very early versions of Ruby we always tracked memory allocation. This is why I found FUD comments such as this troubling:
the issue is that the Ruby GC is triggered on total number of objects, and not total amount of used memory
This is clearly misunderstanding Ruby. In fact, the aforementioned article does nothing to mention memory allocation may trigger a GC.
Historically Ruby was quite conservative issuing GCs based on the amount of memory allocated. Ruby keeps track of all memory allocated (using malloc) outside of the Ruby heaps between GCs. In Ruby 2.0, out-of-the-box every 8MB of allocations will result in a full GC. This number is way too small for almost any Rails app, which is why increasing RUBY_GC_MALLOC_LIMIT is one of the most cargo culted settings out there in the wild.
Matz picked this tiny number years ago when it was a reasonable default, however it was not revised till Ruby 2.1 landed.
For Ruby 2.1 Koichi decided to revamp this sub-system. The goal was to have defaults that work well for both scripts and web apps.
Instead of having a single malloc limit for our app, we now have a starting point malloc limit that will dynamically grow every time we trigger a GC by exceeding the limit. To stop unbound growth of the limit we have max values set.
We track memory allocations from 2 points in time:
memory allocated outside Ruby heaps since last minor GC
memory allocated since last major GC.
At any point in time we can get a snapshot of the current situation with GC.stat:
> GC.stat
=> {:count=>25,
:heap_used=>263,
:heap_length=>406,
:heap_increment=>143,
:heap_live_slot=>106806,
:heap_free_slot=>398,
:heap_final_slot=>0,
:heap_swept_slot=>25258,
:heap_eden_page_length=>263,
:heap_tomb_page_length=>0,
:total_allocated_object=>620998,
:total_freed_object=>514192,
:malloc_increase=>1572992,
:malloc_limit=>16777216,
:minor_gc_count=>21,
:major_gc_count=>4,
:remembered_shady_object=>1233,
:remembered_shady_object_limit=>1694,
:old_object=>65229,
:old_object_limit=>93260,
:oldmalloc_increase=>2298872,
:oldmalloc_limit=>16777216}
malloc_increase denotes the amount of memory we allocated since the last minor GC. oldmalloc_increase the amount since last major GC.
We can tune our settings, from "Ruby 2.1 Out-of-Band GC":
RUBY_GC_MALLOC_LIMIT: (default: 16MB)
RUBY_GC_MALLOC_LIMIT_MAX: (default: 32MB)
RUBY_GC_MALLOC_LIMIT_GROWTH_FACTOR: (default: 1.4x)
and
RUBY_GC_OLDMALLOC_LIMIT: (default: 16MB)
RUBY_GC_OLDMALLOC_LIMIT_MAX: (default: 128MB)
RUBY_GC_OLDMALLOC_LIMIT_GROWTH_FACTOR: (default: 1.2x)
So, in theory, this unbound memory growth is not possible for the script above. The two MAX values should just cap the growth and force GCs.
However, this is not the case in Ruby 2.1.1
Investigating the issue
We spent a lot of time ensuring we had extensive instrumentation built in to Ruby 2.1, we added memory profiling hooks, we added GC hooks, we exposed a large amount of internal information. This has certainly paid off.
Analyzing the issue raised by this mini script is trivial using the gc_tracer gem. This gem allows us to get a very detailed snapshot of the system every time a GC is triggered and store it in a text file, easily consumable by spreadsheet.
We simply add this to the rogue script:
require 'gc_tracer'
GC::Tracer.start_logging("log.txt")
And get a very detailed trace back in the text file:
Run this on Ruby 2.1.1 and you will be out of memory soon:
while true
"a" * (1024 ** 2)
end
Malloc limits, Ruby and you
From very early versions of Ruby we always tracked memory allocation. This is why I found FUD comments such as this troubling:
the issue is that the Ruby GC is triggered on total number of objects, and not total amount of used memory
This is clearly misunderstanding Ruby. In fact, the aforementioned article does nothing to mention memory allocation may trigger a GC.
Historically Ruby was quite conservative issuing GCs based on the amount of memory allocated. Ruby keeps track of all memory allocated (using malloc) outside of the Ruby heaps between GCs. In Ruby 2.0, out-of-the-box every 8MB of allocations will result in a full GC. This number is way too small for almost any Rails app, which is why increasing RUBY_GC_MALLOC_LIMIT is one of the most cargo culted settings out there in the wild.
Matz picked this tiny number years ago when it was a reasonable default, however it was not revised till Ruby 2.1 landed.
For Ruby 2.1 Koichi decided to revamp this sub-system. The goal was to have defaults that work well for both scripts and web apps.
Instead of having a single malloc limit for our app, we now have a starting point malloc limit that will dynamically grow every time we trigger a GC by exceeding the limit. To stop unbound growth of the limit we have max values set.
We track memory allocations from 2 points in time:
memory allocated outside Ruby heaps since last minor GC
memory allocated since last major GC.
At any point in time we can get a snapshot of the current situation with GC.stat:
> GC.stat
=> {:count=>25,
:heap_used=>263,
:heap_length=>406,
:heap_increment=>143,
:heap_live_slot=>106806,
:heap_free_slot=>398,
:heap_final_slot=>0,
:heap_swept_slot=>25258,
:heap_eden_page_length=>263,
:heap_tomb_page_length=>0,
:total_allocated_object=>620998,
:total_freed_object=>514192,
:malloc_increase=>1572992,
:malloc_limit=>16777216,
:minor_gc_count=>21,
:major_gc_count=>4,
:remembered_shady_object=>1233,
:remembered_shady_object_limit=>1694,
:old_object=>65229,
:old_object_limit=>93260,
:oldmalloc_increase=>2298872,
:oldmalloc_limit=>16777216}
malloc_increase denotes the amount of memory we allocated since the last minor GC. oldmalloc_increase the amount since last major GC.
We can tune our settings, from "Ruby 2.1 Out-of-Band GC":
RUBY_GC_MALLOC_LIMIT: (default: 16MB)
RUBY_GC_MALLOC_LIMIT_MAX: (default: 32MB)
RUBY_GC_MALLOC_LIMIT_GROWTH_FACTOR: (default: 1.4x)
and
RUBY_GC_OLDMALLOC_LIMIT: (default: 16MB)
RUBY_GC_OLDMALLOC_LIMIT_MAX: (default: 128MB)
RUBY_GC_OLDMALLOC_LIMIT_GROWTH_FACTOR: (default: 1.2x)
So, in theory, this unbound memory growth is not possible for the script above. The two MAX values should just cap the growth and force GCs.
However, this is not the case in Ruby 2.1.1
Investigating the issue
We spent a lot of time ensuring we had extensive instrumentation built in to Ruby 2.1, we added memory profiling hooks, we added GC hooks, we exposed a large amount of internal information. This has certainly paid off.
Analyzing the issue raised by this mini script is trivial using the gc_tracer gem. This gem allows us to get a very detailed snapshot of the system every time a GC is triggered and store it in a text file, easily consumable by spreadsheet.
We simply add this to the rogue script:
require 'gc_tracer'
GC::Tracer.start_logging("log.txt")
And get a very detailed trace back in the text file:
Installs tested
rvm install 1.9.3-p385
rvm install 1.9.3-p385 --patch https://github.com/funny-falcon/ruby/compare/p385...p385_falcon.diff
rvm install 2.0.0
note this is an in-progress post, feel free to comment, but it will grow quite a bit as I flesh it out.
Environments testing
default
RUBY_GC_MALLOC_LIMIT=90000000 (GC)
LD_PRELOAD=/usr/lib/libtcmalloc_minimal.so.0.1.0 (GC+TC)
RAILS_ENV=production bundle exec thin start -p 3111
RAILS_ENV=production RUBY_GC_MALLOC_LIMIT=90000000 bundle exec thin start -p 3111
RAILS_ENV=production RUBY_GC_MALLOC_LIMIT=90000000 LD_PRELOAD=/usr/lib/libtcmalloc_minimal.so.0.1.0 bundle exec thin start -p 3111
(tried a bunch of other "recommended" optimisations, most of them made little to no diff)
Test rig
Production: Intel(R) Xeon(R) CPU E3-1280 V2 @ 3.60GHz
Test performed
Results for: ab -n 100 -c 1 http://localhost:3111/
first run discarded
rvm use 1.9.3-p385
RAILS_ENV=production bundle exec thin start -p 3111
min mean[+/-sd] median max
Default: 209 260 15.3 263 297
GC : 84 135 32.1 139 205
GC+TC : 76 124 27.1 131 178
rvm install 1.9.3-p385 --patch falcon
min mean[+/-sd] median max
Default: 72 131 49.1 124 540
GC : 70 120 18.9 126 158
GC+TC: 67 121 40.4 116 395
rvm use 2.0.0
min mean[+/-sd] median max
Default: 119 158 25.3 154 211
GC : 65 120 36.1 118 221
GC+TC : 67 113 20.0 114 175
What is the deal with RUBY_GC_MALLOC_LIMIT
This is one of the most poorly documented bits of optimisation out there:
github.com 160
ruby/ruby/blob/trunk/gc.c#L68-L70
#include <sys/time.h>
#endif
github.com 29
ruby/ruby/blob/trunk/gc.c#L3479-L3483
while (sweep_page) {
heap->sweep_pages = next = sweep_page->next;
gc_page_sweep(objspace, heap, sweep_page);
Out-of-the box Ruby forces a GC every time 8 megs of memory is allocated this is insanely low, our front page caused about 32megs of allocation thus causing 4 GCs. bumping this up to allowing 90megs of allocation before forcing a GC means there is much less pressure of the GC. We are able to handle 3 or so requests without a GC.
As a rule we want the number large, but not too large, if its too big we will bleed way too much memory and GC will take forever. If its too small, like it is out-of-the-box we will have multiple GCs per page which is horrible on so many levels.
What is the impact of raising RUBY_GC_MALLOC_LIMIT to 90 million ?
On our test rig, resident memory raised from 116Megs to 193Megs for the GC setting alone and up to 211Megs when combined with tcmalloc
What do you recommend?
I have 2 recommendations depending on your stance:
Conservative: All conservative installs should use RUBY_GC_MALLOC_LIMIT=90000000 , its a no-brainer and the increased memory usage is well worth the extra perf, you get a 48% decrease in median request time just by doing that.
Bleeding Edge: I would recommend using Ruby 2.0 allocating memory using tcmalloc. You get a 57% decrease in median request time using that.
I can no longer recommend falcon patches due to the fact most of the patches are in 2.0 anyway, embrace the future.
Issues with this test
The test was performed on a CPU that has Turbo Boost69 enabled. This means that clock speed varies based on load. It made it more complicated to get consistent test runs, that said I repeated all the tests multiple times and it was run on our production boxes.
Why are we eating so much memory per request?
The easiest thing to answer here is what kind of memory we are consuming: (adding ?pp=profile-gc)
230 THOUSAND strings allocated during one request
It is going to take us a while to work through this to find the actual culprits, though quite a few of the strings allocated over and over again do seem to be coming from Active Record, so focusing on cutting the amount of querying is a sure way to reduce string allocations during a request life cycle.
What about Rails 4?
We need to re-test this under rails 4 to see if its faster or slower and determine if string allocations have gone up or down.
Are you able to reproduce these allocations on OS X? If so (since you're able to use Ruby 2.0), you should be able to find the allocations pretty easily with DTrace. Use this script (I'm writing this from memory, so it may need tweaks):
ruby$target:::string-create {
@_[copyinstr(arg1), arg2] = count();
}
Start your Rails process, run a few request, then attach to the pid with DTrace, run another request, and kill DTrace. You should end up with aggregate counts of file name / line number as the key, and number of string allocations done at that location.
he biggest offenders according to the probes appear to be:
# this MiniProfiler monkey patch
def values(*args, &blk)
return values_without_profiling(*args, &blk) unless @miniprofiler_sql_id
# and this AR method
def hash_rows
@hash_rows ||= @rows.map { |row|
# this string allocation
Hash[@columns.zip(row)]
}
end
One method AR could get rid of this double allocation would be to nuke the underlying hash in favor of a real object (tm), cache some sort of deserializer.
A HUGE packet of these string allocations could go away if pg supported a mode where it returned coerced types as opposed to strings.
I wonder if the MiniProfiler monkey patch should really be attributed as the "responsible party" for the allocation, the allocation happened actually a fair bit deeper in the cruby extension.
onfirm this fix removes that huge number of allocations, it is a huge perf boost for AR.
A very big reason for the excessive allocations is us, for example we run:
SELECT "topics".* FROM "topics" WHERE ("topics"."deleted_at" IS NULL) AND (topics.archetype <> 'private_message') ORDER BY CASE WHEN topics.category_id IS NULL and topics.pinned THEN 0 ELSE 1 END, topics.bumped_at DESC LIMIT 30
We are not limiting down to the actual columns we need, this table has 38 columns, so that means the pg gem is allocating at least double the amount of strings we need.
The natural thing is to add a .select there, but my concern is that if we forget a column the failure mode is very problematic, stuff will not "explode" it will fail silently pretending the missing columns are nil. I wonder if we should add some special mode in dev to assist here. Regardless, we will fix this.
Additionally, we have an N+1 query hanging off category.
I think I will add a mode to MiniProfiler that allows you to see object creation count per query, that will allow people to attack the biggest offenders.
The very frustrating thing about this topic table is that only 2 out of the 38 columns are actually strings.
I profiled the simple:
(ran it twice before in production to ensure it was warmed up), and got:
Topic.limit(30).to_a
Amount of string allocations still feels excessive (even after the patch - I patched local)
File / Line Number / Allocations.
-------------------------------------
/gems/activerecord-3.2.12/lib/active_record/connection_adapters/abstract_adapter.rb 276 1
/gems/activerecord-3.2.12/lib/active_record/log_subscriber.rb 4 1
/gems/activerecord-3.2.12/lib/active_record/log_subscriber.rb 8 1
/gems/activerecord-3.2.12/lib/active_record/querying.rb 38 1
/gems/activerecord-3.2.12/lib/active_record/relation/spawn_methods.rb 53 1
/gems/activerecord-3.2.12/lib/active_record/scoping/default.rb 119 1
/gems/arel-3.0.2/lib/arel.rb 43 1
/gems/arel-3.0.2/lib/arel/visitors/to_sql.rb 126 1
/gems/arel-3.0.2/lib/arel/visitors/to_sql.rb 131 1
/gems/arel-3.0.2/lib/arel/visitors/to_sql.rb 135 1
/gems/arel-3.0.2/lib/arel/visitors/to_sql.rb 139 1
/gems/arel-3.0.2/lib/arel/visitors/to_sql.rb 192 1
/gems/arel-3.0.2/lib/arel/visitors/to_sql.rb 200 1
/gems/arel-3.0.2/lib/arel/visitors/to_sql.rb 283 1
/gems/arel-3.0.2/lib/arel/visitors/to_sql.rb 339 1
test.rb 4 1
test.rb 7 1
/gems/activerecord-3.2.12/lib/active_record/relation/query_methods.rb 268 2
/gems/activerecord-3.2.12/lib/active_record/scoping/default.rb 123 2
/gems/arel-3.0.2/lib/arel.rb 39 2
/gems/arel-3.0.2/lib/arel/visitors/to_sql.rb 134 2
/gems/arel-3.0.2/lib/arel/visitors/to_sql.rb 136 2
/gems/arel-3.0.2/lib/arel/visitors/to_sql.rb 187 2
/gems/arel-3.0.2/lib/arel/visitors/to_sql.rb 282 2
/gems/arel-3.0.2/lib/arel/visitors/to_sql.rb 366 2
test.rb 6 2
/gems/activerecord-3.2.12/lib/active_record/connection_adapters/abstract/connection_specification.rb 149 3
/gems/activesupport-3.2.12/lib/active_support/log_subscriber.rb 91 3
/gems/activerecord-3.2.12/lib/active_record/scoping.rb 130 4
/gems/activerecord-3.2.12/lib/active_record/scoping.rb 126 5
/gems/activerecord-3.2.12/lib/active_record/relation/spawn_methods.rb 14 6
/gems/activerecord-3.2.12/lib/active_record/relation/spawn_methods.rb 26 8
/gems/activerecord-3.2.12/lib/active_record/relation/spawn_methods.rb 52 12
/gems/activerecord-3.2.12/lib/active_record/base.rb 514 30
/gems/activerecord-3.2.12/lib/active_record/model_schema.rb 162 30
/gems/activerecord-3.2.12/lib/active_record/relation.rb 26 32
/gems/activerecord-3.2.12/lib/active_record/connection_adapters/postgresql_adapter.rb 665 38
/gems/activerecord-3.2.12/lib/active_record/result.rb 33 38
/gems/activerecord-3.2.12/lib/active_record/relation.rb 27 40
/gems/activerecord-3.2.12/lib/active_record/inheritance.rb 68 60
/gems/activesupport-3.2.12/lib/active_support/callbacks.rb 81 60
/gems/activesupport-3.2.12/lib/active_support/callbacks.rb 414 180
/gems/activerecord-3.2.12/lib/active_record/connection_adapters/postgresql_adapter.rb 605 910
1 Reply
samSam Saffronco-founderMar '13 1 tenderlove
More profiling (both warmed up in prd)
100.times do
Topic.limit(30).to_a
end
vs.
100.times do
ActiveRecord::Base.connection.raw_connection.async_exec("select * from topics limit 30").values
end
sampler:
ruby*:::cmethod-entry
{
k = copyinstr(arg0);
m = copyinstr(arg1);
}
tick-5000hz
/k != 0/
{
@[k, m] = count();
}
AR
Hash each 72
String hash 82
Kernel freeze 87
PG::Result ftype 88
Thread [] 89
Module name 94
Array zip 107
Array each 120
Thread current 123
Class allocate 131
Fixnum abs 136
String intern 147
Hash delete 150
Kernel instance_variable_set 183
Hash []= 195
String to_s 195
String initialize 237
String to_sym 238
Kernel respond_to? 247
Fixnum to_s 286
Class superclass 349
PG::Connection async_exec 354
PG::Result values 383
#<Class:0x007ff6e8b9e510> relation 399
Module === 415
Kernel block_given? 450
Kernel hash 512
Kernel class 517
Hash [] 566
Symbol to_s 586
Hash key? 6810
Raw
Mutexd current 1
Thread [] 1
Mutex unlock 3
Module name 8
Thread current 13
PG::Result values 78
PG::Connection async_exec 266
My particular concern here is all these extraneous calls to .key? , seems the sampler is really spending much time there.
These key? calls are all coming form
2 143160 vm_call_cfunc:cmethod-entry Hash key? /Users/sam/.rvm/gems/ruby-2.0.0-p0/gems/activerecord-3.2.12/lib/active_record/locking/optimistic.rb 174
2 143160 vm_call_cfunc:cmethod-entry Hash key? /Users/sam/.rvm/gems/ruby-2.0.0-p0/gems/activerecord-3.2.12/lib/active_record/attribute_methods/serialization.rb 66

uby 1.9.2 source source

(Hard-coded, but customizable via environment variables with this patch)

GC_MALLOC_LIMIT HEAP_MIN_SLOTS FREE_MIN Ruby 1.9.3 source

RUBY_GC_MALLOC_LIMIT RUBY_HEAP_MIN_SLOTS RUBY_FREE_MIN Ruby 2.0.0 source

Same as Ruby 1.9.3

Ruby 2.1.0 source

RUBY_GC_HEAP_INIT_SLOTS (obsoletes RUBY_HEAP_MIN_SLOTS) RUBY_GC_HEAP_FREE_SLOTS (obsoletes RUBY_FREE_MIN) RUBY_GC_HEAP_GROWTH_FACTOR (new) RUBY_GC_HEAP_GROWTH_MAX_SLOTS (new) RUBY_GC_MALLOC_LIMIT RUBY_GC_MALLOC_LIMIT_MAX (new) RUBY_GC_MALLOC_LIMIT_GROWTH_FACTOR (new) RUBY_GC_OLDMALLOC_LIMIT (new) RUBY_GC_OLDMALLOC_LIMIT_MAX (new) RUBY_GC_OLDMALLOC_LIMIT_GROWTH_FACTOR (new) Ruby 2.1.1 source

RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR (new)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment