When developing a program in Ruby, you may sometimes encounter a memory leak. For a while now, Ruby has a facility to gather information about what objects are laying around: ObjectSpace.
There are several approaches one can take to debug a leak. This discusses a time-based approach, where a full memory dump is generated every, say, 5 minutes, during a time that the memory leak is showing up. Afterwards, one can look at all the objects, and find out which ones are staying around, causing the memory leak.
Setup your Ruby application to dump all objects to a file. If you have an event loop, something like this would work:
require 'objspace'
def heap_dump
GC.start
i = Time.now.strftime('%s')
open("/tmp/ruby-heap-#{i}.dump", "w") do |io|
ObjectSpace.dump_all(output: io)
end
# On Heroku you'll need to push it elsewhere, like S3
#s3 = AWS::S3.new(access_key_id: ENV['S3_ACCESS_KEY'], secret_access_key: ENV['S3_SECRET_KEY'])
#bucket = s3.buckets['qm-import-export']
#obj = bucket.objects["ruby-heap-#{i}.jsonl"]
#obj.write(IO.binread(path))
end
ObjectSpace.trace_object_allocations_start
mainloop do
# assuming your mainloop does the work, and calls this block every 5 minutes
heap_dump
end
Or, if you're having a Rails app, do this in a controller that you visit every 5 minutes
# app/controllers/heap_dumps_controller.rb
class HeapDumpsController < ActionController::Metal
def heap_dump
if ENV['HEAP_DUMP'] == '1' && params[:token].to_s == ENV['HEAP_DUMP_TOKEN']
heap_dump
self.response_body = 'Dumped heap'
else
self.status = 401
self.response_body = 'Invalid token'
end
end
end
# add to config/routes.rb
get "/heap_dump", to: HeapDumpsController.action(:heap_dump)
# config/initializers/heap_dump_tracing.rb
if ENV['HEAP_DUMP'] == 1
require 'objspace'
ObjectSpace.trace_object_allocations_start
end
- Having Ruby, install the dependencies with
bundle install
. - Having PostgreSQL, create the database with
createdb mem_analysis
. - When getting dumps from Amazon S3, s3cmd may come in handy.
If stored on S3, get the dump list. Update the bucket and date in the grep command to reflect your case. This stores filenames and dates in index.txt.
S3_URL=s3://qm-import-export/
s3cmd ls $S3_URL | grep '^2015-11-23' | sed 's/[0-9]*\+\s\+s3:.*\///' >index.txt
Then download them:
for file in `cat index.txt | awk '{print $3}'`; do s3cmd get $S3_URL/$file $file; done
Initialize the database:
bundle exec ruby createdb.rb
Because importing can take quite a while, this is split into two steps: converting each file to SQL, and loading all into the database:
bundle exec ruby gencsv.rb
sh genimport.sh | psql mem_analysis
Now that the database is loaded, we're ready to gather information.
To find out what is causing a memory leak, we can look at graphs plotting memory usage over time in different dimensions.
This is done by graph.rb
. Let's start with the object type.
bundle exec ruby graph.rb type-mem
This will create the file graph-type-mem.png showing the total size of objects by type. If there's one thing leaking, you'll probably have a number of somewhat flat lines, and one with a positive slope, which is the culprit.
Then create a similar graph for that object type only, and plot lines by file, for example. This gives one an idea in which gem the leaking objects may be created. If it's a string, run
bundle exec ruby graph.rb string-mem
If it's something else, edit graph.rb and expand the case
-block. In this way you may be able to zoom in on the cause.