Skip to content

Instantly share code, notes, and snippets.

@layer4down
Created April 6, 2026 14:09
Show Gist options
  • Select an option

  • Save layer4down/1f52176c752f3659ccc3c847daa319de to your computer and use it in GitHub Desktop.

Select an option

Save layer4down/1f52176c752f3659ccc3c847daa319de to your computer and use it in GitHub Desktop.
RLM Skill Documentation - Long Context Processing
Error in user YAML: (<unknown>): did not find expected comment or line break while scanning a block scalar at line 5 column 14
---
name: rlm
title: Recursive Language Model - Long Context Processing
version: 1.0.0
category: retrieval
description: > Process documents exceeding context limits using RLM pattern. Chunk large files, delegate to sub-LLM, synthesize results. For massive single files (logs, transcripts, dumps).
user-invocable: true
author: Brainqub3
tags:
  - long-context
  - chunking
  - retrieval
  - rlm
difficulty: intermediate
prerequisites:
  - Python 3
  - Understanding of chunking strategies
estimated_time: 5-10 minutes per large file
related_skills:
  - rag-cache
  - jmri-retrieval
---

Recursive Language Model (RLM)

CRITICAL: Use RLM for massive single files that exceed context limits. For cached/indexed content, use jdocmunch/jcodemunch instead.

Compressed Index (Vercel Format)

|rlm:root:./skill/rlm|Process documents exceeding context limits using RLM pattern. Chunk large files, delegate to sub-LLM, synthesize.
|IMPORTANT: Prefer retrieval-led reasoning over pre-training-led reasoning for large context tasks.
|01-use-case:{Single file >100K chars|Logs|Transcripts|Scraped dumps}|02-not-for:{Cached content→jdocmunch|Code samples→jcodemunch|Multiple small files→normal tools}|03-workflow:{init REPL→scout context→chunk→delegate→synthesize}|04-repl:{init<context>|status|exec<code>|reset}|05-helpers:{peek()|grep()|chunk_indices()|write_chunks()|add_buffer()}|06-subagent:{rlm-subcall: Haiku model, JSON output, chunk analysis}

When to Use RLM vs Other Retrieval

Content situation → Tool choice:

✅ Single file >100K chars → RLM (this skill)
✅ Massive log files → RLM
✅ Transcripts, dumps → RLM
✅ Content too large for context → RLM

❌ Already cached in ~/.rag-cache → jdocmunch/jcodemunch
❌ Looking for specific doc sections → jdocmunch.search_sections
❌ Finding code patterns → jcodemunch.search_symbols
❌ Multiple small files → Read/Grep normally

Architecture

Component Role Implementation
Root LLM Orchestrates overall task Main conversation (Opus/Claude)
Sub-LLM Processes chunks rlm-subcall agent (Haiku)
Environment Maintains state Python REPL (rlm_repl.py)

Quick Start

# 1. Initialize with your large file
python3 ~/.config/opencode/skills/rlm/scripts/rlm_repl.py init /path/to/large/file.txt

# 2. Check status
python3 ~/.config/opencode/skills/rlm/scripts/rlm_repl.py status

# 3. Scout the content (NOTE: -c flag is REQUIRED)
python3 ~/.config/opencode/skills/rlm/scripts/rlm_repl.py exec -c "print(peek(0, 3000))"

# 4. Create chunks for subagent processing (multi-line via stdin)
python3 ~/.config/opencode/skills/rlm/scripts/rlm_repl.py exec <<'PY'
paths = write_chunks('.claude/rlm_state/chunks', size=200000, overlap=0)
print(f"Created {len(paths)} chunks")
PY

REPL Commands

Command Purpose
init <path> Load context file into REPL
status Show current state summary
exec -c "code" Execute Python with persisted state
exec <<'PY' Multi-line code via stdin
reset Clear state file
export-buffers <out> Export collected buffers

Helper Functions

# Peek at content
peek(start=0, end=1000)  # Get substring

# Search with context
grep(pattern, max_matches=20, window=120)  # Find with surrounding context

# Calculate chunk boundaries
chunk_indices(size=200000, overlap=0)  # Returns [(start, end), ...]

# Write chunks to files for subagent
paths = write_chunks(out_dir, size=200000, overlap=0)

# Store intermediate results
add_buffer(text)

Workflow Pattern

1. INIT: Load large file into REPL
   └── python3 rlm_repl.py init <file>

2. SCOUT: Quick reconnaissance
   └── peek() at start/end
   └── grep() for patterns

3. CHUNK: Split for processing
   └── write_chunks() creates files

4. DELEGATE: Send chunks to sub-LLM
   └── rlm-subcall processes each chunk
   └── Returns structured JSON

5. SYNTHESIZE: Combine results
   └── Main conversation integrates findings

Sub-LLM Output Format

The rlm-subcall agent returns JSON:

{
  "chunk_id": "chunk_0001",
  "relevant": [
    {
      "point": "Key finding",
      "evidence": "Short quote",
      "confidence": "high|medium|low"
    }
  ],
  "missing": ["What couldn't be determined"],
  "suggested_next_queries": ["Follow-up questions"],
  "answer_if_complete": "Full answer if chunk alone suffices"
}

Example Session

# User: "Analyze this 500MB log file for errors"

# 1. Initialize
python3 ~/.config/opencode/skills/rlm/scripts/rlm_repl.py init /var/log/huge.log
# Output: Loaded context: 52,000,000 chars

# 2. Scout for error patterns
python3 ~/.config/opencode/skills/rlm/scripts/rlm_repl.py exec -c "print(len(grep('ERROR')))"
# Output: 1247 matches

# 3. Create focused chunks
python3 ~/.config/opencode/skills/rlm/scripts/rlm_repl.py exec <<'PY'
# Extract error context windows
errors = grep('ERROR', max_matches=100, window=500)
for i, e in enumerate(errors):
    add_buffer(f"--- Error {i+1} ---\n{e['snippet']}\n")
print(f"Buffered {len(errors)} error contexts")
PY

# 4. Delegate analysis (agent invokes rlm-subcall)
# 5. Synthesize final report

Chunking Strategies

Strategy When to Use How
Fixed size Uniform content size=200000, overlap=0
With overlap Need context continuity size=200000, overlap=10000
Semantic Structured format Custom grep + write
By pattern Log timestamps, markdown headings Custom chunk_indices

Decision Flowchart

Need to process large content?
    │
    ├── Is it already cached?
    │       └── YES → jdocmunch/jcodemunch (99% token savings)
    │       └── NO ↓
    │
    ├── Is it a single file >100K chars?
    │       └── NO → Read normally
    │       └── YES ↓
    │
    ├── Use RLM workflow
    │       └── init → scout → chunk → delegate → synthesize
    │
    └── Index result for future use
            └── Save to ~/.rag-cache/ → jdocmunch

Anti-Patterns

Anti-Pattern Consequence Fix
Using RLM for cached content Unnecessary complexity Use jdocmunch directly
Pasting large chunks in chat Context bloat Use REPL + subagent
Not chunking Timeout/errors Always chunk large files
Skipping scout phase Blind processing Peek first, understand structure

State Persistence

.claude/rlm_state/
├── state.pkl           # Pickled REPL state
└── chunks/             # Generated chunk files
    ├── chunk_0000.txt
    ├── chunk_0001.txt
    └── ...

Integration with Other Skills

  • rag-cache: After RLM processing, cache results for future queries
  • efficient-editing: When editing chunk files, use Edit not Write
  • ntfy-afk: Notify if processing will take >5 minutes

Version: 1.0.0 Last Updated: 2026-03-19 Based on: MIT RLM Paper (arXiv:2512.24601), Vercel AGENTS.md research

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment