TruffleRuby's design centers on making the common case fast (no svars needed) while still supporting the frame-local, thread-local semantics Ruby requires. There are several layers of optimization.
Every method frame reserves slot 1 for a SpecialVariableStorage object (see TranslatorEnvironment.java). Block frames do not get their own slot — they share the enclosing method's storage by walking up the declaration frame chain (the depth is cached per FrameDescriptor, so the JIT unrolls this).
SpecialVariableStorage itself is tiny — it holds just two fields:
lastMatch($~) — aThreadAndFrameLocalStoragelastLine($_) — aThreadAndFrameLocalStorage
Both are lazily allocated (null until first written). $1, $2, $&, $`, $', $+ are not stored separately — they're derived by reading $~ and indexing into the MatchData at read time.
Key source files:
ThreadAndFrameLocalStorage exploits the observation that svars are almost always accessed from the thread that created them:
public Object get(Node node, InlinedConditionProfile sameThreadProfile) {
if (sameThreadProfile.profile(node,
RubyLanguage.getThreadId(Thread.currentThread()) == originalThreadId)) {
return originalThreadValue; // fast path: plain field read
} else {
return fallbackGet(); // slow path: ThreadLocal lookup
}
}- On creation, it records the current thread ID and stores the value in a plain
Objectfield. - Same-thread access (the overwhelmingly common case): a thread ID comparison + direct field read. No synchronization, no hash lookup.
- Cross-thread access (rare): falls back to a lazily-created
volatile ThreadLocal<Object>.
This is the most impactful optimization and likely the most relevant to CRuby.
Methods like String#=~, Regexp#match, String#gsub, etc. must write $~ into the caller's frame. In CRuby, this means walking the frame stack (via rb_backref_set). In a JIT, you'd lose the ability to optimize frames away.
The protocol is coordinated across SpecialVariablesSendingNode, DispatchNode, and ReadCallerVariablesNode.
-
Every method's
FrameDescriptorhas anAssumptioncalled"does not need SpecialVariableStorage"(initially valid). -
At every call site,
DispatchNodechecks the assumption:if (!specialVariableAssumption.isValid()) { RubyArguments.setCallerSpecialVariables(rubyArgs, readingNode.execute(frame, this)); }
If the assumption is still valid (i.e., no callee has ever asked for the caller's svars), nothing is passed — zero overhead on every call.
-
When a callee (e.g.,
Regexp#=~) actually needs the caller's svars,ReadCallerVariablesNode.execute()checks whether the caller passed them in the arguments. If not, it walks the stack once to find the caller's frame, grabs theSpecialVariableStorage, and invalidates the assumption on the caller'sFrameDescriptor. -
From that point on, every call from that method passes svars — but only from call sites in that specific method. Other callers that never invoke svar-needing methods remain unaffected.
The caller's SpecialVariableStorage is passed in slot 1 of the frame arguments array (see RubyArguments.java):
private enum ArgumentIndicies {
DECLARATION_FRAME, // 0
CALLER_SPECIAL_VARIABLES, // 1 <-- SpecialVariableStorage or null
METHOD, // 2
DECLARATION_CONTEXT, // 3
FRAME_ON_STACK_MARKER, // 4
SELF, // 5
BLOCK, // 6
DESCRIPTOR // 7
// user arguments follow
}Once the assumption is invalidated, RubyMethodRootNode.execute() eagerly creates the SpecialVariableStorage in the frame on every method entry:
var specialVariablesAssumption = SpecialVariableStorage.getAssumption(frame.getFrameDescriptor());
if (!specialVariablesAssumption.isValid()) {
SpecialVariableStorage.set(frame, new SpecialVariableStorage());
}Methods whose assumption is still valid skip this entirely.
Methods that use Primitive.caller_special_variables (which calls ReadCallerVariablesNode) are marked with Primitive.always_split. For example, from regexp.rb:
def =~(str)
result = Truffle::RegexpOperations.match(self, str, 0)
Primitive.regexp_last_match_set(Primitive.caller_special_variables, result)
result.begin(0) if result
end
Primitive.always_split self, :=~This tells the JIT to clone the method body for each call site. Because the caller frame is call-site-specific, this allows caller_special_variables to resolve to a known, constant SpecialVariableStorage at compile time rather than doing a polymorphic frame walk.
$~ and $_ are registered as "hooked" global variables in Ruby code (match_data.rb):
Truffle::KernelOperations.define_hooked_variable(
:$~,
-> s { Primitive.regexp_last_match_get(s) },
Truffle::RegexpOperations::LAST_MATCH_SET)The getter receives the SpecialVariableStorage as its argument. When ReadGlobalVariableNode sees a hooked variable with getter arity 1, it automatically calls GetSpecialVariableStorage to resolve the current frame's storage and passes it to the getter. This keeps the Ruby-level API clean while the frame machinery is handled automatically.
$1, $2, etc. are handled differently in the parser — they desugar to "read $~, then call MatchData#[]" at parse time (YARPTranslator.visitNumberedReferenceReadNode).
When a block is created, BlockDefinitionNode captures the enclosing method's SpecialVariableStorage and stores it on the RubyProc object:
return ProcOperations.createRubyProc(
...
frame.materialize(),
readSpecialVariableStorageNode.execute(frame, this), // captures svar storage
...);This ensures that a proc/lambda shares the svar storage of the frame in which it was defined, matching CRuby semantics without requiring frame walks at runtime.
For C extensions, the ExtensionCallStack maintains a separate stack of special variable storage entries, pushed/popped as C function calls are made. The primitive cext_special_variables_from_stack retrieves the current one:
@Primitive(name = "cext_special_variables_from_stack")
public abstract static class VarsFromStackNode extends PrimitiveArrayArgumentsNode {
@Specialization
Object variables() {
return getLanguage().getCurrentFiber().extensionCallStack.getSpecialVariables();
}
}-
Lazy svar allocation —
SpecialVariableStorageis only created when first needed (the svar slot starts asnil). -
Assumption-based demand passing — The
specialVariableAssumption(per method) starts valid.DispatchNodeandCallSuperMethodNodeonly pass the svar when the assumption is invalid. The assumption is invalidated only when a callee actually needs the caller's svars — a one-time deoptimization. -
Fast thread-local —
ThreadAndFrameLocalStorageuses a direct field access for the original thread, only falling back toThreadLocalfor multi-threaded access. -
Lazy
ThreadAndFrameLocalStoragecreation — ThelastMatchandlastLinefields inSpecialVariableStorageare only created on first write. -
Primitive.always_split— Methods that accesscaller_special_variablesare always split/cloned per call site, ensuring the caller frame is known at compile time. -
Frame slot at fixed index — The svar slot is always at index 1 in method frames, making access a single array read with no lookup.
-
Declaration depth caching —
GetSpecialVariableStoragecaches thedeclarationDepthperFrameDescriptor, so the walk from block to enclosing method is unrolled at compile time. -
Derived
$1/$2— Numbered references do NOT store separate values — they read$~(once) and index into the MatchData, avoiding redundant storage. -
No MatchData created for
match?—Regexp#match?passescreateMatchData=false, avoiding MatchData allocation entirely when only a boolean result is needed.
| Optimization | TruffleRuby Mechanism | CRuby Applicability |
|---|---|---|
| Lazy svar allocation | Slot 1 starts as nil, SpecialVariableStorage allocated on first write |
Could delay svar struct creation until first regex match in a frame |
| Thread-local fast path | Store owner thread ID + value in plain field; only use ThreadLocal for cross-thread |
Replace TLS hash lookup with inline thread-ID comparison for the owning thread |
| Demand-driven caller passing | Assumption-based: only pass caller svars when a callee has ever asked for them | YJIT/RJIT could track which call sites need rb_backref_set and skip the frame walk otherwise |
Derived $1/$2 |
Not stored; computed from $~ on read |
CRuby already does this |
| Per-call-site specialization | always_split clones svar-needing methods per call site |
JIT inlining of rb_backref_set-using methods lets the JIT statically resolve which caller frame to write to |
The demand-driven passing is probably the highest-impact idea for CRuby. Today, CRuby unconditionally sets up svar plumbing on every method call (via vm_push_frame / the ep chain). If YJIT could start by assuming no callee needs svars and only recompile when one actually does, that would remove overhead from the vast majority of method calls.