Why do we need indifferent_hash while saving?
"/home/shim/Documents/foreman/dynflow/lib/dynflow/utils.rb:60:in `indifferent_hash'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:268:in `extract_metadata'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:226:in `prepare_record'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:236:in `save'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:125:in `save_action'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence.rb:32:in `save_action'",
Still while saving, it will convert every inner hash to indifferent too. Meaning even more allocations.
"/home/shim/Documents/foreman/dynflow/lib/dynflow/utils.rb:60:in `indifferent_hash'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/utils.rb:192:in `convert_value'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/utils.rb:109:in `block in update'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/utils.rb:105:in `each_pair'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/utils.rb:105:in `update'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/utils.rb:74:in `initialize'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/utils.rb:60:in `new'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/utils.rb:60:in `indifferent_hash'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:268:in `extract_metadata'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:226:in `prepare_record'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:236:in `save'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:125:in `save_action'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence.rb:32:in `save_action'",
Again, indifferent_hash while saving
"/home/shim/Documents/foreman/dynflow/lib/dynflow/utils.rb:60:in `indifferent_hash'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:268:in `extract_metadata'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:226:in `prepare_record'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:236:in `save'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:117:in `save_step'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence.rb:80:in `save_step'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/execution_plan/steps/abstract.rb:62:in `save'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/execution_plan/steps/abstract_flow_step.rb:24:in `open_action'",
Again in saving
"/home/shim/Documents/foreman/dynflow/lib/dynflow/utils.rb:60:in `indifferent_hash'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:268:in `extract_metadata'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:226:in `prepare_record'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:236:in `save'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:80:in `save_execution_plan'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence.rb:51:in `save_execution_plan'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/execution_plan.rb:338:in `save'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/execution_plan.rb:87:in `update_state'",
General question: Why are we using SQL DB's to store JSON data? Wouldn't it be more efficient to work with JSON oriented storages like mongo/couchdb e.t.c?
Save procedure creates an in-memory record. What is the ratio of actual updates as opposed to inserts? Anyway IMHO making an UPDATE query and inspecting the number of affected rows looks way more efficient than SELECT and then UPDATE or INSERT respectively.
"/home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/sequel-4.38.0/lib/sequel/dataset/actions.rb:200:in `first'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:233:in `block in save'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:326:in `with_retry'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:233:in `save'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:125:in `save_action'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence.rb:32:in `save_action'",
Other performance issue: table(table_name).columns (lib/dynflow/persistence_adapters/sequel.rb:223) will induce extra query to the DB to fetch a single row just for the sake of getting column names. From the description of the method:
# If you are looking for all columns for a single table and maybe some information about
# each column (e.g. database type), see <tt>Database#schema</tt>.
"/home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/sequel-4.38.0/lib/sequel/dataset/actions.rb:73:in `columns'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:223:in `prepare_record'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:236:in `save'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:125:in `save_action'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence.rb:32:in `save_action'",
General improvement: we can create indifferent_hash directly from JSON by passing :object_class
option.
Another thought brought up by Ohad: Can we separate dynflow to two parts: scheduler and parallel executor. If we can do this IMHO we could benefit from reusing existing parallel execution framework (and not dealing with tons of concurrency bugs/memory issues that hide there).
["/home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/algebrick-0.7.3/lib/algebrick/product_variant.rb:178", 211],
["/home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/sequel-4.38.0/lib/sequel/database/misc.rb:257", 218],
["/home/shim/.rvm/rubies/ruby-2.2.1/lib/ruby/2.2.0/time.rb:361", 276],
["/home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/multi_json-1.12.1/lib/multi_json/adapters/json_common.rb:19", 298],
["/home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/algebrick-0.7.3/lib/algebrick/type_check.rb:25", 300],
["/home/shim/Documents/foreman/dynflow/lib/dynflow/serializable.rb:48", 335],
["/home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/sequel-4.38.0/lib/sequel/adapters/sqlite.rb:359", 394]
in 'lib/dynflow/persistence_adapters/sequel.rb:142' worth limiting the select to a single field (:data
)
"/home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/sequel-4.38.0/lib/sequel/adapters/sqlite.rb:359:in `base_type_name'",
"/home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/sequel-4.38.0/lib/sequel/adapters/sqlite.rb:325:in `block (2 levels) in fetch_rows'",
...
"/home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/sequel-4.38.0/lib/sequel/dataset/actions.rb:970:in `execute'",
"/home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/sequel-4.38.0/lib/sequel/adapters/sqlite.rb:322:in `fetch_rows'",
"/home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/sequel-4.38.0/lib/sequel/dataset/actions.rb:141:in `each'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:142:in `to_a'",
or
"/home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/byebug-9.0.5/lib/byebug/context.rb:96:in `at_line'",
"/home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/sequel-4.38.0/lib/sequel/adapters/sqlite.rb:359:in `base_type_name'",
"/home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/sequel-4.38.0/lib/sequel/adapters/sqlite.rb:325:in `block (2 levels) in fetch_rows'",
...
"/home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/sequel-4.38.0/lib/sequel/dataset/actions.rb:652:in `single_record'",
"/home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/sequel-4.38.0/lib/sequel/dataset/actions.rb:200:in `first'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:233:in `block in save'",
sequel.rb:233
appears more than once in this investigation, probably needs refactoring.
We don't have to create a new instance of the value - we can reuse existing and
check recursively that we don't have something that is not a hash. If we find one
we will replace the value with it's hash.
Also in lib/dynflow/action.rb:513
the code tries to serialize the value,
but does nothing with it. Maybe we should be optimistic here, and try to serialize
only when we actually need the result. Another option would be a specialized method.
We should avoid calling methods with splat params when we can - it creates an array object for each call! ????? Couldn't hit this BP ?????
break /home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/multi_json-1.12.1/lib/multi_json/adapters/json_common.rb:19
This line creates a whole load of strings (probably keys and values)
"/home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/multi_json-1.12.1/lib/multi_json/adapters/json_common.rb:19:in `dump'",
"/home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/multi_json-1.12.1/lib/multi_json/adapter.rb:25:in `dump'",
"/home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/multi_json-1.12.1/lib/multi_json.rb:139:in `dump'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:273:in `dump_data'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:224:in `prepare_record'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:236:in `save'",
"/home/shim/Documents/foreman/dynflow/lib/dynflow/persistence_adapters/sequel.rb:125:in `save_action'",
Unfortunately, we can do nothing about this.
While parsing dates, the system creates a bunch of strings with intermediate results:
"09",
"11",
"2016",
"2016-11-09 ",
" ",
"03",
"24",
"17",
"17:24:03",
"17:24:03",
"2016-11-09 17:24:03",
" ",
"2016-11-09 17:24:03",
"2016-11-09 17:24:03",
"2016-11-09 17:24:03",
Why are we parsing so much dates? For a single iteration it adds up to 276 intermediate results...
Again it comes from sequel.rb:233
.
Again table(table_name).columns
(sequel.rb:223
)
Again it comes from sequel.rb:233
.
break /home/shim/.rvm/gems/ruby-2.2.1@rails4/gems/algebrick-0.7.3/lib/algebrick/product_variant.rb:178
????? Couldn't hit this BP ?????
RUBY_GC_HEAP_INIT_SLOTS=200000 RUBY_GC_HEAP_FREE_SLOTS=10000 RUBY_GC_HEAP_GROWTH_FACTOR=1.25 ruby /home/shim/Documents/foreman/dynflow/examples/remote_executor.rb server
Before the first task runs, the system consumes around 200K objects. This is why we set the 'RUBY_GC_HEAP_INIT_SLOTS' to 200K.
Each iteration allocates around 6K objects, since we don't want our system to make more than one GC cycle per task, we bump RUBY_GC_HEAP_FREE_SLOTS
(Minimum number of free slots before GC runs) to 10K. This number should be tuned further with real world tasks. Since we have a pretty big heap to start with, we can grow in a slower rate. I have limited the growth to 25% of the heap size instead of 80% default.
- https://github.com/ruby-prof/ruby-prof
- https://github.com/ko1/gc_tracer
- https://github.com/ko1/allocation_tracer
Good how-to guide for how to use those tools: https://gist.github.com/ko1/40110a3d951c19ed6979
GC internals: http://tmm1.net/ruby21-rgengc/
Heap dump analysis: https://github.com/schneems/heapy
JS UI: https://github.com/tenderlove/heap-analyzer
graph plot: https://cirw.in/blog/find-references
Based on both UI's I have created the heap_analyzer.rb
script (attached). It is working on heap dumps provided by modified_remote_executor.rb
example. Make sure you uncomment the right dump lines.