This is an experiment to pre-train and AOT enough of JRuby to get startup time for "gem list" down to a reasonable speed. I've tried various combintions and so far this seems to be the best, but the timings are highly variable.
Step 1: Perform a gem list while dumping IR. This saves the compiled IR for all loaded files to disk, to skip parsing and compiling in future runs.
jruby --dev -Xir.writing -S gem list > /dev/null
Step 2: Train AOT by running the following script repeatedly with the given JRuby arguments. Requires a small JRuby patch because IR reading breaks for any script has not been written (like the training script).
Patch:
diff --git a/core/src/main/java/org/jruby/parser/ParserManager.java b/core/src/main/java/org/jruby/parser/ParserManager.java
index d0b1b032bb..c882badc09 100644
--- a/core/src/main/java/org/jruby/parser/ParserManager.java
+++ b/core/src/main/java/org/jruby/parser/ParserManager.java
@@ -116,7 +116,7 @@ public class ParserManager {
if (parserTiming) nanos = System.nanoTime();
ParseResult result;
- if (RubyInstanceConfig.IR_READING) {
+ if (RubyInstanceConfig.IR_READING && type != INLINE && type != MAIN) {
result = loadFileFromIRPersistence(fileName, lineNumber, in, encoding, scope, type);
} else {
result = parser.parse(fileName, lineNumber, in, encoding, scope, type);
Script:
1000.times do
ruby = org.jruby.Ruby.new_instance
ruby.eval_scriptlet('require "rubygems"')
ruby.tear_down
print ?.
end
Command line:
jruby --dev --nocache -Xir.reading -J-XX:+UseCompactObjectHeaders -J-XX:AOTCacheOutput=jruby.aot training.rb
Step 3: Run gem list again with AOT cache and IR reading.
[] jruby $ time jruby --dev --nocache -Xir.reading -J-XX:+UseCompactObjectHeaders -J-XX:AOTCache=jruby.aot -S gem list > /dev/null
jruby --dev --nocache -Xir.reading -J-XX:+UseCompactObjectHeaders -S gem lis 1.28s user 0.15s system 181% cpu 0.789 total
[] jruby $ time jruby --dev --nocache -Xir.reading -J-XX:+UseCompactObjectHeaders -J-XX:AOTCache=jruby.aot -S gem list > /dev/null
jruby --dev --nocache -Xir.reading -J-XX:+UseCompactObjectHeaders -S gem lis 1.27s user 0.16s system 182% cpu 0.785 total
[] jruby $ time jruby --dev --nocache -Xir.reading -J-XX:+UseCompactObjectHeaders -J-XX:AOTCache=jruby.aot -S gem list > /dev/null
jruby --dev --nocache -Xir.reading -J-XX:+UseCompactObjectHeaders -S gem lis 1.28s user 0.16s system 182% cpu 0.786 total
[] jruby $ time jruby --dev --nocache -Xir.reading -J-XX:+UseCompactObjectHeaders -J-XX:AOTCache=jruby.aot -S gem list > /dev/null
jruby --dev --nocache -Xir.reading -J-XX:+UseCompactObjectHeaders -S gem lis 1.28s user 0.15s system 182% cpu 0.782 total
[] jruby $ time jruby --dev --nocache -Xir.reading -J-XX:+UseCompactObjectHeaders -J-XX:AOTCache=jruby.aot -S gem list > /dev/null
jruby --dev --nocache -Xir.reading -J-XX:+UseCompactObjectHeaders -S gem lis 1.28s user 0.16s system 182% cpu 0.787 total
The sub-0.8s timing is not consistent, though, and most often it seems to be in the 1.4-1.5s range. I can't isolate a reason for why it's sometimes super fast and other times takes twice as long.
Normal execution with dev mode and AppCDS instead of AOT:
[] jruby $ time jruby --dev -J-XX:+UseCompactObjectHeaders -S gem list > /dev/null
jruby --dev -J-XX:+UseCompactObjectHeaders -S gem list > /dev/null 1.41s user 0.16s system 162% cpu 0.963 total
[] jruby $ time jruby --dev -J-XX:+UseCompactObjectHeaders -S gem list > /dev/null
jruby --dev -J-XX:+UseCompactObjectHeaders -S gem list > /dev/null 1.41s user 0.16s system 162% cpu 0.966 total
[] jruby $ time jruby --dev -J-XX:+UseCompactObjectHeaders -S gem list > /dev/null
jruby --dev -J-XX:+UseCompactObjectHeaders -S gem list > /dev/null 1.41s user 0.16s system 162% cpu 0.965 total
[] jruby $ time jruby --dev -J-XX:+UseCompactObjectHeaders -S gem list > /dev/null
jruby --dev -J-XX:+UseCompactObjectHeaders -S gem list > /dev/null 1.40s user 0.16s system 162% cpu 0.958 total
[] jruby $ time jruby --dev -J-XX:+UseCompactObjectHeaders -S gem list > /dev/null
jruby --dev -J-XX:+UseCompactObjectHeaders -S gem list > /dev/null 1.41s user 0.15s system 163% cpu 0.957 total