Skip to content

Instantly share code, notes, and snippets.

@joshRpowell
Forked from wvengen/README.md
Created July 25, 2020 12:10
Show Gist options
  • Save joshRpowell/491a829cad69deda0280998d11ed8dba to your computer and use it in GitHub Desktop.
Save joshRpowell/491a829cad69deda0280998d11ed8dba to your computer and use it in GitHub Desktop.
Ruby memory analysis over time

Finding a Ruby memory leak using a time analysis

When developing a program in Ruby, you may sometimes encounter a memory leak. For a while now, Ruby has a facility to gather information about what objects are laying around: ObjectSpace.

There are several approaches one can take to debug a leak. This discusses a time-based approach, where a full memory dump is generated every, say, 5 minutes, during a time that the memory leak is showing up. Afterwards, one can look at all the objects, and find out which ones are staying around, causing the memory leak.

Gather

Setup your Ruby application to dump all objects to a file. If you have an event loop, something like this would work:

require 'objspace'

def heap_dump
  GC.start
  
  i = Time.now.strftime('%s')

  open("/tmp/ruby-heap-#{i}.dump", "w") do |io|
    ObjectSpace.dump_all(output: io)
  end
  
  # On Heroku you'll need to push it elsewhere, like S3
  #s3 = AWS::S3.new(access_key_id: ENV['S3_ACCESS_KEY'], secret_access_key: ENV['S3_SECRET_KEY'])
  #bucket = s3.buckets['qm-import-export']
  #obj = bucket.objects["ruby-heap-#{i}.jsonl"]
  #obj.write(IO.binread(path))
end

ObjectSpace.trace_object_allocations_start
mainloop do
  # assuming your mainloop does the work, and calls this block every 5 minutes
  heap_dump
end

Or, if you're having a Rails app, do this in a controller that you visit every 5 minutes

# app/controllers/heap_dumps_controller.rb
class HeapDumpsController < ActionController::Metal

  def heap_dump
    if ENV['HEAP_DUMP'] == '1' && params[:token].to_s == ENV['HEAP_DUMP_TOKEN']
      heap_dump
      self.response_body = 'Dumped heap'
    else
      self.status = 401
      self.response_body = 'Invalid token'
    end
  end
end

# add to config/routes.rb
get "/heap_dump", to: HeapDumpsController.action(:heap_dump)

# config/initializers/heap_dump_tracing.rb
if ENV['HEAP_DUMP'] == 1
  require 'objspace'
  ObjectSpace.trace_object_allocations_start
end

Install

  • Having Ruby, install the dependencies with bundle install.
  • Having PostgreSQL, create the database with createdb mem_analysis.
  • When getting dumps from Amazon S3, s3cmd may come in handy.

Import

If stored on S3, get the dump list. Update the bucket and date in the grep command to reflect your case. This stores filenames and dates in index.txt.

S3_URL=s3://qm-import-export/
s3cmd ls $S3_URL | grep '^2015-11-23' | sed 's/[0-9]*\+\s\+s3:.*\///' >index.txt

Then download them:

for file in `cat index.txt | awk '{print $3}'`; do s3cmd get $S3_URL/$file $file; done

Initialize the database:

bundle exec ruby createdb.rb

Because importing can take quite a while, this is split into two steps: converting each file to SQL, and loading all into the database:

bundle exec ruby gencsv.rb
sh genimport.sh | psql mem_analysis

Analyse

Now that the database is loaded, we're ready to gather information. To find out what is causing a memory leak, we can look at graphs plotting memory usage over time in different dimensions. This is done by graph.rb. Let's start with the object type.

bundle exec ruby graph.rb type-mem

This will create the file graph-type-mem.png showing the total size of objects by type. If there's one thing leaking, you'll probably have a number of somewhat flat lines, and one with a positive slope, which is the culprit.

Then create a similar graph for that object type only, and plot lines by file, for example. This gives one an idea in which gem the leaking objects may be created. If it's a string, run

bundle exec ruby graph.rb string-mem

If it's something else, edit graph.rb and expand the case-block. In this way you may be able to zoom in on the cause.

Sample

graph-type-count

#!/usr/bin/env ruby
require_relative 'db'
init_database
#!/usr/bin/env ruby
require 'active_record'
ActiveRecord::Base.establish_connection({adapter: 'postgresql', database: 'mem_analysis'})
def connection
ActiveRecord::Base.connection
end
class SpaceObject < ActiveRecord::Base
self.inheritance_column = 'zoink' # use type as ordinary column (not STI)
has_many :references, class_name: 'SpaceObjectReference', foreign_key: 'from_id', inverse_of: 'from', dependent: :destroy
has_one :default, class_name: 'SpaceObject', foreign_key: 'default', primary_key: 'address'
end
class SpaceObjectReference < ActiveRecord::Base
belongs_to :from, class_name: 'SpaceObject', required: true, inverse_of: 'references'
belongs_to :to, class_name: 'SpaceObject', foreign_key: 'to_address', primary_key: 'address'
end
def init_database(c = connection)
c.tables.each {|t| c.drop_table(t) }
c.create_table 'space_objects' do |t|
t.datetime :time
t.string :type
t.string :node_type
t.string :root
t.string :address
t.text :value
t.string :klass
t.string :name
t.string :struct
t.string :file
t.string :line
t.string :method
t.integer :generation
t.integer :size
t.integer :length
t.integer :memsize
t.integer :bytesize
t.integer :capacity
t.integer :ivars
t.integer :fd
t.string :encoding
t.string :default_address
t.boolean :freezed
t.boolean :fstring
t.boolean :embedded
t.boolean :shared
t.boolean :flag_wb_protected
t.boolean :flag_old
t.boolean :flag_long_lived
t.boolean :flag_marking
t.boolean :flag_marked
end
c.create_table 'space_object_references' do |t|
t.integer :from_id, null: false
t.string :to_address, null: false
end
restore_indexes
nil
end
def remove_indexes(c = connection)
c.indexes('space_objects').each {|i| connection.remove_index('space_objects', name: i.name) }
c.indexes('space_objects_references').each {|i| connection.remove_index('space_objects_references', name: i.name) }
end
def restore_indexes(c = connection)
c.change_table 'space_objects' do |t|
t.index :time
t.index :address
t.index :type
t.index [:klass, :method]
t.index [:file, :line]
t.index :size
t.index :memsize
end
c.execute('VACUUM ANALYZE')
end
source 'https://rubygems.org'
gem 'pg', '~> 0.18.4'
gem 'activerecord', '~> 4.2.5'
gem 'ruby-progressbar', '~> 1.7.5'
gem 'gnuplot', '~> 2.6.2'
#!/usr/bin/env ruby
require 'ruby-progressbar'
require 'json'
require 'csv'
def parse_dump(filename, &block)
lines = open(filename).readlines
lines.each do |line|
block.call JSON.parse(line), lines.count
end
end
def parse_index(filename, &block)
open(filename).each do |line|
date, time, dumpname = line.split(/\s+/)
block.call dumpname, "#{date} #{time}"
end
end
FIELDS = %w(time type node_type root address value klass name struct file line method generation size length memsize bytesize capacity ivars fd encoding default_address freezed fstring embedded shared flag_wb_protected flag_old flag_long_lived flag_marking flag_marked)
REF_FIELDS = %w(id from_id to_address)
id = 1
ref_id = 1
parse_index('index.txt') do |file, time|
next if ARGV.any? && !ARGV.include?(file)
progressbar = ProgressBar.create(title: file, format: "%t |%B| %c/%C %E", throttle_rate: 0.5)
CSV.open(file.gsub(/.jsonl$/i, '') + '.csv', 'w') do |csv|
csv << FIELDS
CSV.open(file.gsub(/.jsonl$/i, '') + '.refs.csv', 'w') do |ref_csv|
ref_csv << REF_FIELDS
parse_dump(file) do |data, count|
progressbar.total = count
data['value'] = data['value'].gsub(/[^[:print:]]/, '.') if data['value'] # allow string database column
data['klass'] = data.delete('class') if data['class'] # avoid error
data['freezed'] = data.delete('frozen') if data['frozen'] # idem
data['default_address'] = data.delete('default') if data['default'] # consistency
data['time'] = time
data['id'] = id
(data.delete('flags') || {}).each {|k, v| data["flag_#{k}"] = v }
data['default_address'] = data.delete('default') if data['default']
refs = data.delete('references') || []
csv << FIELDS.map {|f| data[f]}
refs.each do |ref|
ref_csv << [ref_id, id, ref]
ref_id += 1
end
id += 1
progressbar.increment
end
end
end
end
#!/bin/sh
for file in *.csv; do
table=space_objects
echo "$file" | grep -q '\.refs\.csv$' && table=space_object_references
echo "\\COPY $table (`head -n1 $file`) FROM '$file' WITH (FORMAT CSV, HEADER);"
done
echo "VACUUM ANALYZE;"
#!/usr/bin/env ruby
require 'date'
require 'yaml'
require 'gnuplot'
require_relative 'db'
### Parse arguments
type = ARGV[0]
type == 'type' and type = 'type-mem'
case type
when 'type-count'
ylabel = 'count'
query, ycolumn, group = nil, 'COUNT(id)', :type
key_pos = 'left top'
when 'type-mem'
query, ycolumn, group = nil, 'SUM(memsize)', :type
ylabel, yscale = 'memsize [MB]', 1024*1024
key_pos = 'left top'
when 'string-count'
ylabel = 'count'
query, ycolumn, group = {type: 'STRING'}, 'COUNT(id)', :file
when 'string-mem'
query, ycolumn, group = {type: 'STRING'}, 'SUM(memsize)', :file
ylabel, yscale = 'memsize [MB]', 1024*1024
when 'data-count'
ylabel = 'count'
query, ycolumn, group = {type: 'DATA'}, 'COUNT(id)', :file
when 'data-mem'
query, ycolumn, group = {type: 'DATA'}, 'SUM(memsize)', :file
ylabel, yscale = 'memsize [MB]', 1024*1024
else
STDERR.puts "Usage: graph <type>"
exit 1
end
xoffset = 60*60 # GMT+1
graph_basename = File.dirname(File.expand_path(__FILE__)) + '/graph-' + type
### Read cache or execute query
if File.exists?(graph_basename + '.yml')
data = YAML.load(File.read(graph_basename + '.yml'))
else
scope = SpaceObject
scope = scope.where(**query) if query
scope = scope.order(ycolumn + ' DESC NULLS LAST')
scope = scope.group(:time, group)
data = scope.limit(500).pluck(group, :time, ycolumn)
File.open(graph_basename + '.yml', 'w') do |f|
f.write(data.to_yaml)
end
end
### Then plot
Gnuplot.open(persist: true) do |gp|
Gnuplot::Plot.new(gp) do |plot|
plot.terminal 'png large'
plot.output graph_basename + '.png'
plot.xdata :time
plot.timefmt '"%s"'
plot.format 'x "%H:%M"'
plot.xlabel "time"
plot.ylabel ylabel
plot.key key_pos if key_pos
grouped_data = data.group_by(&:first)
keys = grouped_data.keys.sort_by {|key| -grouped_data[key].reduce(0) {|sum,d| sum + (d[2]||0) } }
keys[0,10].each do |key|
data = grouped_data[key]
data.sort_by!{|d| d[1] }
x = data.map{|d| d[1].to_i + (xoffset||0) }
y = data.map{|d| d[2] }
y = data.map{|d| (d[2]||0) / (yscale||1) }
plot.data << Gnuplot::DataSet.new( [x, y] ) do |ds|
ds.using = '1:2'
ds.with = "linespoints"
ds.title = key || '(empty)'
end
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment