Skip to content

Instantly share code, notes, and snippets.

@nirvdrum
Created February 17, 2026 23:01
Show Gist options
  • Select an option

  • Save nirvdrum/9be6cf0f026a3473fb6ee8795486725f to your computer and use it in GitHub Desktop.

Select an option

Save nirvdrum/9be6cf0f026a3473fb6ee8795486725f to your computer and use it in GitHub Desktop.

How TruffleRuby Implements svars ($~, $1, $2, etc.)

TruffleRuby's design centers on making the common case fast (no svars needed) while still supporting the frame-local, thread-local semantics Ruby requires. There are several layers of optimization.

1. Storage: SpecialVariableStorage at a Fixed Frame Slot

Every method frame reserves slot 1 for a SpecialVariableStorage object (see TranslatorEnvironment.java). Block frames do not get their own slot — they share the enclosing method's storage by walking up the declaration frame chain (the depth is cached per FrameDescriptor, so the JIT unrolls this).

SpecialVariableStorage itself is tiny — it holds just two fields:

  • lastMatch ($~) — a ThreadAndFrameLocalStorage
  • lastLine ($_) — a ThreadAndFrameLocalStorage

Both are lazily allocated (null until first written). $1, $2, $&, $`, $', $+ are not stored separately — they're derived by reading $~ and indexing into the MatchData at read time.

Key source files:

2. Thread Safety Without ThreadLocal Overhead

ThreadAndFrameLocalStorage exploits the observation that svars are almost always accessed from the thread that created them:

public Object get(Node node, InlinedConditionProfile sameThreadProfile) {
    if (sameThreadProfile.profile(node,
            RubyLanguage.getThreadId(Thread.currentThread()) == originalThreadId)) {
        return originalThreadValue;  // fast path: plain field read
    } else {
        return fallbackGet();  // slow path: ThreadLocal lookup
    }
}
  • On creation, it records the current thread ID and stores the value in a plain Object field.
  • Same-thread access (the overwhelmingly common case): a thread ID comparison + direct field read. No synchronization, no hash lookup.
  • Cross-thread access (rare): falls back to a lazily-created volatile ThreadLocal<Object>.

3. Demand-Driven Caller Passing (The Key Optimization)

This is the most impactful optimization and likely the most relevant to CRuby.

The Problem

Methods like String#=~, Regexp#match, String#gsub, etc. must write $~ into the caller's frame. In CRuby, this means walking the frame stack (via rb_backref_set). In a JIT, you'd lose the ability to optimize frames away.

The Solution: Assumption-Based Demand-Driven Protocol

The protocol is coordinated across SpecialVariablesSendingNode, DispatchNode, and ReadCallerVariablesNode.

  1. Every method's FrameDescriptor has an Assumption called "does not need SpecialVariableStorage" (initially valid).

  2. At every call site, DispatchNode checks the assumption:

    if (!specialVariableAssumption.isValid()) {
        RubyArguments.setCallerSpecialVariables(rubyArgs, readingNode.execute(frame, this));
    }

    If the assumption is still valid (i.e., no callee has ever asked for the caller's svars), nothing is passed — zero overhead on every call.

  3. When a callee (e.g., Regexp#=~) actually needs the caller's svars, ReadCallerVariablesNode.execute() checks whether the caller passed them in the arguments. If not, it walks the stack once to find the caller's frame, grabs the SpecialVariableStorage, and invalidates the assumption on the caller's FrameDescriptor.

  4. From that point on, every call from that method passes svars — but only from call sites in that specific method. Other callers that never invoke svar-needing methods remain unaffected.

Where Svars Live in the Frame Arguments

The caller's SpecialVariableStorage is passed in slot 1 of the frame arguments array (see RubyArguments.java):

private enum ArgumentIndicies {
    DECLARATION_FRAME,          // 0
    CALLER_SPECIAL_VARIABLES,   // 1  <-- SpecialVariableStorage or null
    METHOD,                     // 2
    DECLARATION_CONTEXT,        // 3
    FRAME_ON_STACK_MARKER,      // 4
    SELF,                       // 5
    BLOCK,                      // 6
    DESCRIPTOR                  // 7
    // user arguments follow
}

Eager Initialization After Deoptimization

Once the assumption is invalidated, RubyMethodRootNode.execute() eagerly creates the SpecialVariableStorage in the frame on every method entry:

var specialVariablesAssumption = SpecialVariableStorage.getAssumption(frame.getFrameDescriptor());
if (!specialVariablesAssumption.isValid()) {
    SpecialVariableStorage.set(frame, new SpecialVariableStorage());
}

Methods whose assumption is still valid skip this entirely.

4. Primitive.always_split for Caller-Svar Methods

Methods that use Primitive.caller_special_variables (which calls ReadCallerVariablesNode) are marked with Primitive.always_split. For example, from regexp.rb:

def =~(str)
  result = Truffle::RegexpOperations.match(self, str, 0)
  Primitive.regexp_last_match_set(Primitive.caller_special_variables, result)
  result.begin(0) if result
end
Primitive.always_split self, :=~

This tells the JIT to clone the method body for each call site. Because the caller frame is call-site-specific, this allows caller_special_variables to resolve to a known, constant SpecialVariableStorage at compile time rather than doing a polymorphic frame walk.

5. Hooked Global Variables (Ruby-Visible API)

$~ and $_ are registered as "hooked" global variables in Ruby code (match_data.rb):

Truffle::KernelOperations.define_hooked_variable(
  :$~,
  -> s { Primitive.regexp_last_match_get(s) },
  Truffle::RegexpOperations::LAST_MATCH_SET)

The getter receives the SpecialVariableStorage as its argument. When ReadGlobalVariableNode sees a hooked variable with getter arity 1, it automatically calls GetSpecialVariableStorage to resolve the current frame's storage and passes it to the getter. This keeps the Ruby-level API clean while the frame machinery is handled automatically.

$1, $2, etc. are handled differently in the parser — they desugar to "read $~, then call MatchData#[]" at parse time (YARPTranslator.visitNumberedReferenceReadNode).

6. Block and Proc Integration

When a block is created, BlockDefinitionNode captures the enclosing method's SpecialVariableStorage and stores it on the RubyProc object:

return ProcOperations.createRubyProc(
        ...
        frame.materialize(),
        readSpecialVariableStorageNode.execute(frame, this),  // captures svar storage
        ...);

This ensures that a proc/lambda shares the svar storage of the frame in which it was defined, matching CRuby semantics without requiring frame walks at runtime.

7. C Extension Support

For C extensions, the ExtensionCallStack maintains a separate stack of special variable storage entries, pushed/popped as C function calls are made. The primitive cext_special_variables_from_stack retrieves the current one:

@Primitive(name = "cext_special_variables_from_stack")
public abstract static class VarsFromStackNode extends PrimitiveArrayArgumentsNode {
    @Specialization
    Object variables() {
        return getLanguage().getCurrentFiber().extensionCallStack.getSpecialVariables();
    }
}

8. Summary of Optimizations

  1. Lazy svar allocationSpecialVariableStorage is only created when first needed (the svar slot starts as nil).

  2. Assumption-based demand passing — The specialVariableAssumption (per method) starts valid. DispatchNode and CallSuperMethodNode only pass the svar when the assumption is invalid. The assumption is invalidated only when a callee actually needs the caller's svars — a one-time deoptimization.

  3. Fast thread-localThreadAndFrameLocalStorage uses a direct field access for the original thread, only falling back to ThreadLocal for multi-threaded access.

  4. Lazy ThreadAndFrameLocalStorage creation — The lastMatch and lastLine fields in SpecialVariableStorage are only created on first write.

  5. Primitive.always_split — Methods that access caller_special_variables are always split/cloned per call site, ensuring the caller frame is known at compile time.

  6. Frame slot at fixed index — The svar slot is always at index 1 in method frames, making access a single array read with no lookup.

  7. Declaration depth cachingGetSpecialVariableStorage caches the declarationDepth per FrameDescriptor, so the walk from block to enclosing method is unrolled at compile time.

  8. Derived $1/$2 — Numbered references do NOT store separate values — they read $~ (once) and index into the MatchData, avoiding redundant storage.

  9. No MatchData created for match?Regexp#match? passes createMatchData=false, avoiding MatchData allocation entirely when only a boolean result is needed.

9. Applicability to CRuby

Optimization TruffleRuby Mechanism CRuby Applicability
Lazy svar allocation Slot 1 starts as nil, SpecialVariableStorage allocated on first write Could delay svar struct creation until first regex match in a frame
Thread-local fast path Store owner thread ID + value in plain field; only use ThreadLocal for cross-thread Replace TLS hash lookup with inline thread-ID comparison for the owning thread
Demand-driven caller passing Assumption-based: only pass caller svars when a callee has ever asked for them YJIT/RJIT could track which call sites need rb_backref_set and skip the frame walk otherwise
Derived $1/$2 Not stored; computed from $~ on read CRuby already does this
Per-call-site specialization always_split clones svar-needing methods per call site JIT inlining of rb_backref_set-using methods lets the JIT statically resolve which caller frame to write to

The demand-driven passing is probably the highest-impact idea for CRuby. Today, CRuby unconditionally sets up svar plumbing on every method call (via vm_push_frame / the ep chain). If YJIT could start by assuming no callee needs svars and only recompile when one actually does, that would remove overhead from the vast majority of method calls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment