Skip to content

Instantly share code, notes, and snippets.

@ShimShtein
Last active January 13, 2018 04:27
Show Gist options
  • Save ShimShtein/194f682cebe5b05110065f9abb1863e2 to your computer and use it in GitHub Desktop.
Save ShimShtein/194f682cebe5b05110065f9abb1863e2 to your computer and use it in GitHub Desktop.
Dynflow performance analysis.

break /home/shim/Documents/foreman/dynflow/lib/dynflow/utils.rb:60

Why do we need indifferent_hash while saving?

 "/home/shim/Documents/foreman/dynflow/lib/dynflow/utils.rb:60:in `indifferent_hash'",
 "/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:268:in `extract_metadata'",
 "/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:226:in `prepare_record'",
 "/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:236:in `save'",
 "/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:125:in `save_action'",
 "/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence.rb:32:in `save_action'",

Still while saving, it will convert every inner hash to indifferent too. Meaning even more allocations.

 "/home/shim/Documents/foreman/dynflow/lib/dynflow/utils.rb:60:in `indifferent_hash'",
 "/home/shim/Documents/foreman/dynflow/lib/dynflow/utils.rb:192:in `convert_value'",
 "/home/shim/Documents/foreman/dynflow/lib/dynflow/utils.rb:109:in `block in update'",
 "/home/shim/Documents/foreman/dynflow/lib/dynflow/utils.rb:105:in `each_pair'",
 "/home/shim/Documents/foreman/dynflow/lib/dynflow/utils.rb:105:in `update'",
 "/home/shim/Documents/foreman/dynflow/lib/dynflow/utils.rb:74:in `initialize'",
 "/home/shim/Documents/foreman/dynflow/lib/dynflow/utils.rb:60:in `new'",
 "/home/shim/Documents/foreman/dynflow/lib/dynflow/utils.rb:60:in `indifferent_hash'",
 "/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:268:in `extract_metadata'",
 "/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:226:in `prepare_record'",
 "/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:236:in `save'",
 "/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:125:in `save_action'",
 "/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence.rb:32:in `save_action'",

Again, indifferent_hash while saving

 "/home/shim/Documents/foreman/dynflow/lib/dynflow/utils.rb:60:in `indifferent_hash'",
 "/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:268:in `extract_metadata'",
 "/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:226:in `prepare_record'",
 "/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:236:in `save'",
 "/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:117:in `save_step'",
 "/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence.rb:80:in `save_step'",
 "/home/shim/Documents/foreman/dynflow/lib/dynflow/execution_plan/steps/abstract.rb:62:in `save'",
 "/home/shim/Documents/foreman/dynflow/lib/dynflow/execution_plan/steps/abstract_flow_step.rb:24:in `open_action'",

Again in saving

 "/home/shim/Documents/foreman/dynflow/lib/dynflow/utils.rb:60:in `indifferent_hash'",
 "/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:268:in `extract_metadata'",
 "/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:226:in `prepare_record'",
 "/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:236:in `save'",
 "/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:80:in `save_execution_plan'",
 "/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence.rb:51:in `save_execution_plan'",
 "/home/shim/Documents/foreman/dynflow/lib/dynflow/execution_plan.rb:338:in `save'",
 "/home/shim/Documents/foreman/dynflow/lib/dynflow/execution_plan.rb:87:in `update_state'",

General question: Why are we using SQL DB's to store JSON data? Wouldn't it be more efficient to work with JSON oriented storages like mongo/couchdb e.t.c?


break /home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/sequel-4.38.0/lib/sequel/adapters/sqlite.rb:329

Save procedure creates an in-memory record. What is the ratio of actual updates as opposed to inserts? Anyway IMHO making an UPDATE query and inspecting the number of affected rows looks way more efficient than SELECT and then UPDATE or INSERT respectively.

 "/home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/sequel-4.38.0/lib/sequel/dataset/actions.rb:200:in `first'",
 "/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:233:in `block in save'",
 "/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:326:in `with_retry'",
 "/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:233:in `save'",
 "/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:125:in `save_action'",
 "/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence.rb:32:in `save_action'",

Other performance issue: table(table_name).columns (lib/dynflow/persistence_adapters/sequel.rb:223) will induce extra query to the DB to fetch a single row just for the sake of getting column names. From the description of the method:

# If you are looking for all columns for a single table and maybe some information about
# each column (e.g. database type), see <tt>Database#schema</tt>.
 "/home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/sequel-4.38.0/lib/sequel/dataset/actions.rb:73:in `columns'",
 "/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:223:in `prepare_record'",
 "/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:236:in `save'",
 "/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:125:in `save_action'",
 "/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence.rb:32:in `save_action'",

break /home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/json-2.0.2/lib/json/common.rb:156

General improvement: we can create indifferent_hash directly from JSON by passing :object_class option.


Another thought brought up by Ohad: Can we separate dynflow to two parts: scheduler and parallel executor. If we can do this IMHO we could benefit from reusing existing parallel execution framework (and not dealing with tons of concurrency bugs/memory issues that hide there).


most counted by creation place

 ["/home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/algebrick-0.7.3/lib/algebrick/product_variant.rb:178", 211],
 ["/home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/sequel-4.38.0/lib/sequel/database/misc.rb:257", 218],
 ["/home/shim/.rvm/rubies/ruby-2.2.1/lib/ruby/2.2.0/time.rb:361", 276],
 ["/home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/multi_json-1.12.1/lib/multi_json/adapters/json_common.rb:19", 298],
 ["/home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/algebrick-0.7.3/lib/algebrick/type_check.rb:25", 300],
 ["/home/shim/Documents/foreman/dynflow/lib/dynflow/serializable.rb:48", 335],
 ["/home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/sequel-4.38.0/lib/sequel/adapters/sqlite.rb:359", 394]

break /home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/sequel-4.38.0/lib/sequel/adapters/sqlite.rb:359

in 'lib/dynflow/persistence_adapters/sequel.rb:142' worth limiting the select to a single field (:data)

"/home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/sequel-4.38.0/lib/sequel/adapters/sqlite.rb:359:in `base_type_name'",
"/home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/sequel-4.38.0/lib/sequel/adapters/sqlite.rb:325:in `block (2 levels) in fetch_rows'",
...
"/home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/sequel-4.38.0/lib/sequel/dataset/actions.rb:970:in `execute'",
"/home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/sequel-4.38.0/lib/sequel/adapters/sqlite.rb:322:in `fetch_rows'",
"/home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/sequel-4.38.0/lib/sequel/dataset/actions.rb:141:in `each'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:142:in `to_a'",

or

"/home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/byebug-9.0.5/lib/byebug/context.rb:96:in `at_line'",
"/home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/sequel-4.38.0/lib/sequel/adapters/sqlite.rb:359:in `base_type_name'",
"/home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/sequel-4.38.0/lib/sequel/adapters/sqlite.rb:325:in `block (2 levels) in fetch_rows'",
...
"/home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/sequel-4.38.0/lib/sequel/dataset/actions.rb:652:in `single_record'",
"/home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/sequel-4.38.0/lib/sequel/dataset/actions.rb:200:in `first'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:233:in `block in save'",

sequel.rb:233 appears more than once in this investigation, probably needs refactoring.

break /home/shim/Documents/foreman/dynflow/lib/dynflow/serializable.rb:48

We don't have to create a new instance of the value - we can reuse existing and check recursively that we don't have something that is not a hash. If we find one we will replace the value with it's hash. Also in lib/dynflow/action.rb:513 the code tries to serialize the value, but does nothing with it. Maybe we should be optimistic here, and try to serialize only when we actually need the result. Another option would be a specialized method.

break /home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/algebrick-0.7.3/lib/algebrick/type_check.rb:25

We should avoid calling methods with splat params when we can - it creates an array object for each call! ????? Couldn't hit this BP ?????

break /home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/multi_json-1.12.1/lib/multi_json/adapters/json_common.rb:19

This line creates a whole load of strings (probably keys and values)

"/home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/multi_json-1.12.1/lib/multi_json/adapters/json_common.rb:19:in `dump'",
"/home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/multi_json-1.12.1/lib/multi_json/adapter.rb:25:in `dump'",
"/home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/multi_json-1.12.1/lib/multi_json.rb:139:in `dump'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:273:in `dump_data'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:224:in `prepare_record'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:236:in `save'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:125:in `save_action'",

Unfortunately, we can do nothing about this.

break /home/shim/.rvm/rubies/ruby-2.2.1/lib/ruby/2.2.0/time.rb:361

While parsing dates, the system creates a bunch of strings with intermediate results:

"09",
 "11",
 "2016",
 "2016-11-09  ",
 " ",
 "03",
 "24",
 "17",
 "17:24:03",
 "17:24:03",
 "2016-11-09 17:24:03",
 " ",
 "2016-11-09 17:24:03",
 "2016-11-09 17:24:03",
 "2016-11-09 17:24:03",

Why are we parsing so much dates? For a single iteration it adds up to 276 intermediate results...

Again it comes from sequel.rb:233. Again table(table_name).columns (sequel.rb:223)

break /home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/sequel-4.38.0/lib/sequel/database/misc.rb:257

Again it comes from sequel.rb:233.

break /home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/algebrick-0.7.3/lib/algebrick/product_variant.rb:178

????? Couldn't hit this BP ?????


GC Optimizations

RUBY_GC_HEAP_INIT_SLOTS=200000 RUBY_GC_HEAP_FREE_SLOTS=10000 RUBY_GC_HEAP_GROWTH_FACTOR=1.25 ruby /home/shim/Documents/foreman/dynflow/examples/remote_executor.rb server

Before the first task runs, the system consumes around 200K objects. This is why we set the 'RUBY_GC_HEAP_INIT_SLOTS' to 200K. Each iteration allocates around 6K objects, since we don't want our system to make more than one GC cycle per task, we bump RUBY_GC_HEAP_FREE_SLOTS (Minimum number of free slots before GC runs) to 10K. This number should be tuned further with real world tasks. Since we have a pretty big heap to start with, we can grow in a slower rate. I have limited the growth to 25% of the heap size instead of 80% default.


Tools I was using

Good how-to guide for how to use those tools: https://gist.github.com/ko1/40110a3d951c19ed6979

GC internals: http://tmm1.net/ruby21-rgengc/

Heap dump analysis: https://github.com/schneems/heapy

JS UI: https://github.com/tenderlove/heap-analyzer

graph plot: https://cirw.in/blog/find-references

Based on both UI's I have created the heap_analyzer.rb script (attached). It is working on heap dumps provided by modified_remote_executor.rb example. Make sure you uncomment the right dump lines.

require 'json'
class Heap
attr_reader :objects, :obj_index, :parents_index
def initialize(filename, classes_map)
@filename = filename
@classes_map = JSON.parse(File.read(classes_map))
end
def load
@objects = []
@obj_index = {}
@parents_index = {}
read do |parsed|
parsed['class_name'] = @classes_map[parsed['class'].to_i(16).to_s] if parsed['class']
address = parsed['address']
@obj_index[address] = parsed
@objects << parsed
(parsed['references'] || []).each do |child_address|
parents = @parents_index[child_address] || []
parents << address
@parents_index[child_address] = parents
end
end
end
private
def read
File.open(@filename) do |f|
f.each_line do |line|
begin
parsed = JSON.parse(line)
yield parsed
rescue JSON::ParserError
puts "Could not parse #{line}"
end
end
end
end
end
def draw_puml(roots, objects, prefix)
visited = Set.new
edges = render_edges(roots, visited)
obj_hash = objects.select{ |k,v| visited.member?(k) || roots.member?(v) }
if obj_hash.count > 2
File.open("/tmp/objects_map_#{prefix}.puml", "w") do |f|
f.write("@startuml\n")
f.write(render_classes(obj_hash))
f.write(edges)
f.write("\n@enduml\n")
end
puts 'puml done'
end
end
def render_classes(objects)
objects.values.map{ |o| render_object(o) }.join("\n")
end
def render_object(obj)
"class #{obj['address']} << #{obj['class_name']} >>" +
if obj['value']
" {\n value = #{obj['value']}\n}\n"
else
"\n"
end
end
def render_edges(roots, visited = Set.new)
return "" if roots.empty?
next_roots = []
roots.map do |root|
refs = root['references']
if refs
next_roots += refs.map { |child| child if visited.add?(child) }.compact
refs.map { |child| "#{root['address']} -> #{child}" }.join("\n")
else
nil
end
end.compact.join("\n") + render_edges(next_roots, visited).to_s
end
before_second = Heap.new '/tmp/dump_before_gc_2', '/tmp/classes_map_2'
before_second.load
after_second = Heap.new '/tmp/dump_after_gc_2', '/tmp/classes_map_2'
after_second.load
cleaned = before_second.obj_index.reject { |k, v| after_second.obj_index.key?(k) }
roots = cleaned.reject { |k, v| before_second.parents_index.key?(k) }
by_file = cleaned.values.group_by { |obj| "#{obj['file']}:#{obj['line']}" }
most_created = by_file.sort_by { |k, v| v.length }.last(10).map { |k, v| [k, v.first['class_name'], v.length] }
# i = 0
# roots.values.each do |root|
# draw_puml([root], cleaned, i+=1)
# end
binding.pry
#!/usr/bin/env ruby
# -*- coding: utf-8 -*-
require_relative 'example_helper'
require_relative 'orchestrate_evented'
require 'tmpdir'
require 'csv'
class SampleAction < Dynflow::Action
def plan
# number = rand(1e10)
@@number ||= 0
@@number += 1
puts "Plannin action: #{@@number}"
plan_self(number: @@number)
end
def run
# require 'pry'
# binding.pry
#
puts "Running task number #{input[:number]}"
# puts GC.stat
# classes_map = {}
# ObjectSpace.each_object do |o|
# begin
# classes_map[(o.object_id << 1)] = o.name if o.is_a?(Class) && o.object_id.is_a?(Integer)
# rescue Exception => ex
# binding.pry
# i=0
# end
# end
# File.open("/tmp/classes_map_#{input[:number]}", "w") do |f|
# f.write(classes_map.to_json)
# end
# io=File.open("/tmp/dump_before_gc_#{input[:number]}", "w")
# ObjectSpace.dump_all(output: io)
# io.close
# GC.start
# io=File.open("/tmp/dump_after_gc_#{input[:number]}", "w")
# ObjectSpace.dump_all(output: io)
# io.close
# require 'pry'
# binding.pry
puts 'Done'
end
end
class RemoteExecutorExample
class << self
def run_observer
world = ExampleHelper.create_world do |config|
config.persistence_adapter = persistence_adapter
config.connector = connector
config.executor = false
end
run(world)
end
def run_server
require 'gc_tracer'
GC::Tracer.start_logging
# require 'allocation_tracer'
# ObjectSpace::AllocationTracer.trace
world = ExampleHelper.create_world do |config|
config.persistence_adapter = persistence_adapter
config.connector = connector
end
run(world)
# results = ObjectSpace::AllocationTracer.stop
#
# puts ObjectSpace::AllocationTracer.header.join("\t")
# results.sort_by{|k, v| -v.last}.first(30).each{|k, v|
# puts (k+v).join("\t")
# }
#
# puts GC.stat
end
def run(world)
begin
ExampleHelper.run_web_console(world)
rescue Errno::EADDRINUSE
require 'io/console'
puts "Running without a web console. Press q<enter> to quit."
until STDIN.gets.chomp == 'q'
end
end
end
def db_path
File.expand_path("/tmp/remote_executor_db.sqlite", __FILE__)
end
def persistence_conn_string
ENV['DB_CONN_STRING'] || "sqlite://#{db_path}"
end
def persistence_adapter
Dynflow::PersistenceAdapters::Sequel.new persistence_conn_string
end
def connector
Proc.new { |world| Dynflow::Connectors::Database.new(world) }
end
def run_client
world = ExampleHelper.create_world do |config|
config.persistence_adapter = persistence_adapter
config.executor = false
config.connector = connector
end
# world.trigger(OrchestrateEvented::CreateInfrastructure)
# world.trigger(OrchestrateEvented::CreateInfrastructure, true)
10.times do
world.trigger(SampleAction).finished.wait
# sleep 0.1
end
end
end
end
command = ARGV.first || 'server'
if $0 == __FILE__
case command
when 'observer'
puts <<MSG
The observer starting…. You can see what's going on there
MSG
RemoteExecutorExample.run_observer
when 'server'
puts <<MSG
The server is starting…. You can send the work to it by running:
#{$0} client
MSG
RemoteExecutorExample.run_server
when 'client'
RemoteExecutorExample.run_client
else
puts "Unknown command #{comment}"
exit 1
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment