Skip to content

Instantly share code, notes, and snippets.

@tokenbender
Created February 23, 2026 17:31
Show Gist options
  • Select an option

  • Save tokenbender/6bc43c6b8b4cd9146c94b5bbdc74ba36 to your computer and use it in GitHub Desktop.

Select an option

Save tokenbender/6bc43c6b8b4cd9146c94b5bbdc74ba36 to your computer and use it in GitHub Desktop.
Paper OCR Agent & Skills - Architecture Documentation

Paper OCR System - Complete Architecture

Overview

A nested agent architecture for high-accuracy PDF OCR and structured note generation.


Location

Everything lives in ~/.config/opencode/:

Type Path
Agents ~/.config/opencode/agents/
Skills ~/.config/opencode/skills/

Two Agents

1. paper-ocr-notes (End-to-End Paper Processor)

File: ~/.config/opencode/agents/paper-ocr-notes.md

  • Purpose: Full pipeline from PDF/URL → high-quality notes.md
  • Uses skills: paper-ocr-notes-pipeline, pdf-ocr-feedback
  • Workflow: Resolve paper → OCR → Validate identity → Produce notes + artifacts
---
description: End-to-end paper processor for OCR refinement and high-quality notes with self-contained policy
mode: subagent
tools:
  skill: true
permission:
  skill:
    "paper-ocr-notes-pipeline": allow
    "pdf-ocr-feedback": allow
---

2. ocr-refiner (OCR Specialist)

File: ~/.config/opencode/agents/ocr-refiner.md

  • Purpose: High-accuracy OCR (≥95%) using Maj@K consensus voting
  • Uses skill: pdf-ocr-feedback
  • Core technique: Pass-1 → Self-Eval → Escalate if <95 → K passes consensus vote → Targeted repair
---
description: OCR specialist with Maj@K voting, self-evaluation, and adaptive compute for high-accuracy page refinement
mode: subagent
tools:
  skill: true
permission:
  skill:
    "pdf-ocr-feedback": allow
---

Two Skills

1. pdf-ocr-feedback (OCR Engine)

Dir: ~/.config/opencode/skills/pdf-ocr-feedback/

  • What it does: Pagewise OCR with Maj@K voting, self-evaluation rubric, and adaptive compute
  • Pipeline: Pass-1 → Score (0-100) → If <95: generate K passes → Line-level consensus → Span repair
  • K values: K=3 (default), K=5 (hard pages: equations, tables, multi-column)

Scoring Rubric (5 dimensions):

Dimension Points
Structural Fidelity 0-25
Completeness 0-25
Character/Numeric Accuracy 0-20
Layout-Sensitive Content 0-20
Noise/Garbling 0-10

2. paper-ocr-notes-pipeline (Full Workflow)

Dir: ~/.config/opencode/skills/paper-ocr-notes-pipeline/

  • What it does: Complete paper ingestion → notes.md production
  • Output folder: <output-root>/<paper-slug>/ with: notes.md, README.md, OCR files, comparison report

6 Phases:

  1. Intake & Setup
  2. OCR Feedback Loop
  3. Identity Gate
  4. Ground-truth Extraction
  5. Compose Notes
  6. Revision Passes + Quality Gates

Notes Structure: TL;DR → Beginner's Guide → Paper Summary → Deep Dive → Critical Analysis → Learning Path → Key Equations → References


How They Connect

paper-ocr-notes (outer agent)
    │
    ├── loads skill: paper-ocr-notes-pipeline (workflow orchestration)
    │
    └── calls skill: pdf-ocr-feedback (OCR engine)
            │
            └── could invoke: ocr-refiner (inner agent) for heavy OCR work

Complete Flow Diagram

┌─────────────────────────────────────────────────────────────────────────────────────────┐
│                                      USER INPUT                                         │
│                         (PDF path / paper URL / arXiv abs URL)                          │
└─────────────────────────────────────────────────────────────────────────────────────────┘
                                            │
                                            ▼
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│                              AGENT: paper-ocr-notes                                     │
│                    ~/.config/opencode/agents/paper-ocr-notes.md                         │
│                                                                                         │
│  Loads: paper-ocr-notes-pipeline (master policy)                                        │
│         pdf-ocr-feedback (OCR engine)                                                   │
└─────────────────────────────────────────────────────────────────────────────────────────┘
                                            │
                ┌───────────────────────────┴───────────────────────────┐
                │                                                       │
                ▼                                                       │
┌───────────────────────────────────┐                                   │
│     PHASE 0: INTAKE & SETUP       │                                   │
├───────────────────────────────────┤                                   │
│ • Resolve identity (title,        │                                   │
│   authors, arXiv/DOI)             │                                   │
│ • Create output folder:           │                                   │
│   <root>/<paper-slug>/            │                                   │
│ • Store source PDF                │                                   │
│ • Init notes.md skeleton          │                                   │
│ • Create paper-local README.md    │                                   │
└───────────────────────────────────┘                                   │
                │                                                       │
                ▼                                                       │
┌───────────────────────────────────────────────────────────────────────┴─────────────────┐
│                        PHASE 1: OCR FEEDBACK LOOP                                       │
│                                                                                         │
│   ┌─────────────────────────────────────────────────────────────────────────────────┐   │
│   │                      SKILL: pdf-ocr-feedback                                    │   │
│   │           ~/.config/opencode/skills/pdf-ocr-feedback/                           │   │
│   │                                                                                 │   │
│   │   ┌─────────────────────────────────────────────────────────────────────────┐   │   │
│   │   │                    (Optional) AGENT: ocr-refiner                        │   │   │
│   │   │          ~/.config/opencode/agents/ocr-refiner.md                       │   │   │
│   │   │              (Specialized OCR sub-agent if needed)                      │   │   │
│   │   └─────────────────────────────────────────────────────────────────────────┘   │   │
│   │                                                                                 │   │
│   │   FOR EACH PAGE:                                                                │   │
│   │   ┌─────────────┐                                                               │   │
│   │   │  PASS-1 OCR │ ──────► Full page transcription                               │   │
│   │   └──────┬──────┘                                                               │   │
│   │          │                                                                      │   │
│   │          ▼                                                                      │   │
│   │   ┌─────────────────────────────────────────────────────────────┐               │   │
│   │   │              SELF-EVALUATION (Evaluator Role)               │               │   │
│   │   ├─────────────────────────────────────────────────────────────┤               │   │
│   │   │  Score 0-100 across 5 dimensions:                           │               │   │
│   │   │  • Structural Fidelity     (0-25)                           │               │   │
│   │   │  • Completeness            (0-25)                           │               │   │
│   │   │  • Character/Numeric Acc   (0-20)                           │               │   │
│   │   │  • Layout-Sensitive        (0-20)                           │               │   │
│   │   │  • Noise/Garbling          (0-10)                           │               │   │
│   │   │                                                             │               │   │
│   │   │  + Spot-check 3-5 high-risk snippets                        │               │   │
│   │   │  + Check for RED FLAGS (cap at 90, force retry)             │               │   │
│   │   └──────────────────────────┬──────────────────────────────────┘               │   │
│   │                              │                                                  │   │
│   │              ┌───────────────┴───────────────┐                                  │   │
│   │              │                               │                                  │   │
│   │              ▼                               ▼                                  │   │
│   │   ┌──────────────────┐            ┌──────────────────────────────────┐          │   │
│   │   │  Score ≥ 95 &    │            │  Score < 95 OR Red Flags         │          │   │
│   │   │  No Red Flags    │            │                                  │          │   │
│   │   │                  │            │  ┌────────────────────────────┐  │          │   │
│   │   │    ✓ ACCEPT      │            │  │    MAJ@K ESCALATION        │  │          │   │
│   │   │   (cheap exit)   │            │  ├────────────────────────────┤  │          │   │
│   │   └──────────────────┘            │  │ • K=3 (default pages)      │  │          │   │
│   │                                   │  │ • K=5 (hard pages:         │  │          │   │
│   │                                   │  │   equations, tables,       │  │          │   │
│   │                                   │  │   multi-column, noisy)     │  │          │   │
│   │                                   │  │                            │  │          │   │
│   │                                   │  │ Generate K-1 additional    │  │          │   │
│   │                                   │  │ INDEPENDENT passes         │  │          │   │
│   │                                   │  └─────────────┬──────────────┘  │          │   │
│   │                                   │               │                  │          │   │
│   │                                   │               ▼                  │          │   │
│   │                                   │  ┌────────────────────────────┐  │          │   │
│   │                                   │  │  LINE-LEVEL CONSENSUS VOTE │  │          │   │
│   │                                   │  ├────────────────────────────┤  │          │   │
│   │                                   │  │ • 2+ passes agree → accept │  │          │   │
│   │                                   │  │ • Disputed spans → vote    │  │          │   │
│   │                                   │  │ • Ties → context decides   │  │          │   │
│   │                                   │  │ • No winner → mark         │  │          │   │
│   │                                   │  │   [uncertain]              │  │          │   │
│   │                                   │  └─────────────┬──────────────┘  │          │   │
│   │                                   │               │                  │          │   │
│   │                                   │               ▼                  │          │   │
│   │                                   │  ┌────────────────────────────┐  │          │   │
│   │                                   │  │   RE-EVALUATE MERGED       │  │          │   │
│   │                                   │  └─────────────┬──────────────┘  │          │   │
│   │                                   │               │                  │          │   │
│   │                                   │       ┌───────┴───────┐          │          │   │
│   │                                   │       │               │          │          │   │
│   │                                   │       ▼               ▼          │          │   │
│   │                                   │   ≥ 95?           < 95?          │          │   │
│   │                                   │     │               │            │          │   │
│   │                                   │     ▼               ▼            │          │   │
│   │                                   │  ✓ ACCEPT    ┌─────────────────┐ │          │   │
│   │                                   │              │ TARGETED SPAN   │ │          │   │
│   │                                   │              │ REPAIR          │ │          │   │
│   │                                   │              │ (only flagged   │ │          │   │
│   │                                   │              │  [uncertain]    │ │          │   │
│   │                                   │              │  regions)       │ │          │   │
│   │                                   │              └────────┬────────┘ │          │   │
│   │                                   │                       │          │          │   │
│   │                                   │                       ▼          │          │   │
│   │                                   │              ┌─────────────────┐ │          │   │
│   │                                   │              │ STOPPING CHECK  │ │          │   │
│   │                                   │              │ • ≥95 → ACCEPT  │ │          │   │
│   │                                   │              │ • <2pt gain →   │ │          │   │
│   │                                   │              │   ACCEPT w/note │ │          │   │
│   │                                   │              │ • 3 iterations  │ │          │   │
│   │                                   │              │   → hard cap    │ │          │   │
│   │                                   └──────────────┴─────────────────┴─┘          │   │
│   │                                                                                 │   │
│   │   OUTPUT: Merged pages with ===== PAGE N ===== delimiters                       │   │
│   └─────────────────────────────────────────────────────────────────────────────────┘   │
│                                                                                         │
│   ARTIFACTS PRODUCED:                                                                   │
│   • <paper>.ocr.feedback-loop.txt                                                       │
│   • <paper>.ocr.feedback-loop-comparison.md                                             │
└─────────────────────────────────────────────────────────────────────────────────────────┘
                │
                ▼
┌───────────────────────────────────┐
│   PHASE 1.5: IDENTITY GATE        │
├───────────────────────────────────┤
│ Verify anchors before trusting:   │
│ • Title matches target?           │
│ • Authors match?                  │
│ • arXiv/DOI consistent?           │
│                                   │
│ MISMATCH? → Reject → Re-extract   │
└───────────────────────────────────┘
                │
                ▼
┌───────────────────────────────────┐
│   PHASE 2: GROUND-TRUTH           │
│            EXTRACTION             │
├───────────────────────────────────┤
│ Facts only (no interpretation):   │
│ • Problem statement               │
│ • Contributions as stated         │
│ • Method overview                 │
│ • Key equations/definitions       │
│ • Experimental setup              │
│ • Quantitative results            │
│ • Stated limitations              │
└───────────────────────────────────┘
                │
                ▼
┌───────────────────────────────────────────────────────────────────────────────────────┐
│                    PHASE 3: COMPOSE notes.md                                          │
├───────────────────────────────────────────────────────────────────────────────────────┤
│                                                                                       │
│   1. # <Paper Title>                                                                  │
│   2. ## TL;DR ─────────────────── problem, key idea, 1-2 evidence numbers, caveat     │
│   3. ## Beginner's Guide ──────── college-level, symbols defined, toy example         │
│   4. ## Paper Summary ─────────── problem, contributions, results, experiments table  │
│   5. ## Deep Dive ─────────────── definitions, derivations, implementation sketch     │
│   6. ## Critical Analysis ─────── assumptions, objections, failure modes, tests       │
│   7. ## Learning Path ─────────── prereqs, next steps, diagnostics                    │
│   8. ## Key Equations Summary ─── equation + one-line meaning                         │
│   9. ## References                                                                    │
│                                                                                       │
└───────────────────────────────────────────────────────────────────────────────────────┘
                │
                ▼
┌───────────────────────────────────────────────────────────────────────────────────────┐
│                    PHASE 4: EXPLORATION & CONSISTENCY CHECKS                          │
├───────────────────────────────────────────────────────────────────────────────────────┤
│                                                                                       │
│   Problem Framing          Method Checks              Claims Checks                   │
│   ─────────────────        ─────────────              ─────────────                   │
│   • What breaks?           • What's optimized?       • Theory vs empirical?          │
│   • Boundary conditions?   • Additive vs mult?       • Evidence sources?             │
│                            • Probs vs grads?         • Missing ablations?            │
│                                                                                       │
│   Consistency Checks                                                                  │
│   ──────────────────                                                                  │
│   • Terminology consistent?  • Equations match implementation?                        │
│   • Baseline independence?   • Proxy computability?                                   │
│                                                                                       │
└───────────────────────────────────────────────────────────────────────────────────────┘
                │
                ▼
┌───────────────────────────────────────────────────────────────────────────────────────┐
│                    PHASE 5: REVISION PASSES (mandatory)                               │
├───────────────────────────────────────────────────────────────────────────────────────┤
│                                                                                       │
│   Pass 1: CLARITY              Pass 2: CORRECTNESS         Pass 3: OBJECTION TEST     │
│   ───────────────              ──────────────────          ───────────────────        │
│   • Remove duplicates          • Fix conflations           • Strongest objection?     │
│   • Defs before eqs            • Remove unsupported        • Rebuttal conditions?     │
│   • Soften overconfidence      • Fix pseudocode            • Falsifiable tests?       │
│                                                                                       │
│   Pass 4: UPDATE TL;DR CAVEAT                                                         │
│                                                                                       │
└───────────────────────────────────────────────────────────────────────────────────────┘
                │
                ▼
┌───────────────────────────────────────────────────────────────────────────────────────┐
│                    PHASE 6: QUALITY GATES (must pass all)                             │
├───────────────────────────────────────────────────────────────────────────────────────┤
│                                                                                       │
│   ☐ Accuracy ─────────── symbols/claims correct, evidence-backed                      │
│   ☐ Completeness ─────── experiments table has tasks, baselines, metrics              │
│   ☐ Teachability ─────── beginner section understandable                              │
│   ☐ Critical Rigor ───── ≥1 strong objection + test plan                              │
│   ☐ Implementation ───── pseudocode coherent                                          │
│                                                                                       │
└───────────────────────────────────────────────────────────────────────────────────────┘
                │
                ▼
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│                                    FINAL OUTPUT                                         │
│                              <output-root>/<paper-slug>/                                │
├─────────────────────────────────────────────────────────────────────────────────────────┤
│                                                                                         │
│   📄 <paper>.pdf ──────────────────────────── source file                               │
│   📄 notes.md ─────────────────────────────── PRIMARY DELIVERABLE                       │
│   📄 README.md ────────────────────────────── provenance + artifact index               │
│   📄 <paper>.ocr.feedback-loop.txt ────────── final OCR output                          │
│   📄 <paper>.ocr.feedback-loop-comparison.md  quality/replacement report                │
│                                                                                         │
└─────────────────────────────────────────────────────────────────────────────────────────┘

Design Philosophy

Bitter Lesson Alignment

What's aligned:

  • Maj@K voting — uses scale (more passes) to improve quality
  • Self-evaluation loop — model judges itself rather than external rules
  • Targeted repair — focuses compute where it's needed
  • Stopping on diminishing returns — adaptive compute budget

What's template-driven (but justified):

  • The 9-section notes.md structure encodes how humans want to consume dense information, not domain-specific knowledge
  • This is audience constraint, not model constraint — "explain simply before explaining deeply" is pedagogy, not a heuristic that scale will obsolete

Hardcoded Elements (acknowledged trade-offs)

Element Value Why It Exists
K values 3 or 5 Compute budget heuristic
Score threshold 95 Quality bar
Rubric weights 25/25/20/20/10 Human-intuited importance
Hard page detection equations, 3+ cols, etc. Content heuristic
Max iterations 3 per page, 5 global Compute cap

These could be made adaptive with more engineering, but the current values work well in practice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment