Paper OCR System - Complete Architecture

Overview

A nested agent architecture for high-accuracy PDF OCR and structured note generation.

Location

Everything lives in ~/.config/opencode/:

Type	Path
Agents	`~/.config/opencode/agents/`
Skills	`~/.config/opencode/skills/`

Two Agents

1. `paper-ocr-notes` (End-to-End Paper Processor)

File: ~/.config/opencode/agents/paper-ocr-notes.md

Purpose: Full pipeline from PDF/URL → high-quality notes.md
Uses skills: paper-ocr-notes-pipeline, pdf-ocr-feedback
Workflow: Resolve paper → OCR → Validate identity → Produce notes + artifacts

---
description: End-to-end paper processor for OCR refinement and high-quality notes with self-contained policy
mode: subagent
tools:
  skill: true
permission:
  skill:
    "paper-ocr-notes-pipeline": allow
    "pdf-ocr-feedback": allow
---

2. `ocr-refiner` (OCR Specialist)

File: ~/.config/opencode/agents/ocr-refiner.md

Purpose: High-accuracy OCR (≥95%) using Maj@K consensus voting
Uses skill: pdf-ocr-feedback
Core technique: Pass-1 → Self-Eval → Escalate if <95 → K passes consensus vote → Targeted repair

---
description: OCR specialist with Maj@K voting, self-evaluation, and adaptive compute for high-accuracy page refinement
mode: subagent
tools:
  skill: true
permission:
  skill:
    "pdf-ocr-feedback": allow
---

Two Skills

1. `pdf-ocr-feedback` (OCR Engine)

Dir: ~/.config/opencode/skills/pdf-ocr-feedback/

What it does: Pagewise OCR with Maj@K voting, self-evaluation rubric, and adaptive compute
Pipeline: Pass-1 → Score (0-100) → If <95: generate K passes → Line-level consensus → Span repair
K values: K=3 (default), K=5 (hard pages: equations, tables, multi-column)

Scoring Rubric (5 dimensions):

Dimension	Points
Structural Fidelity	0-25
Completeness	0-25
Character/Numeric Accuracy	0-20
Layout-Sensitive Content	0-20
Noise/Garbling	0-10

2. `paper-ocr-notes-pipeline` (Full Workflow)

Dir: ~/.config/opencode/skills/paper-ocr-notes-pipeline/

What it does: Complete paper ingestion → notes.md production
Output folder: <output-root>/<paper-slug>/ with: notes.md, README.md, OCR files, comparison report

6 Phases:

Intake & Setup
OCR Feedback Loop
Identity Gate
Ground-truth Extraction
Compose Notes
Revision Passes + Quality Gates

Notes Structure: TL;DR → Beginner's Guide → Paper Summary → Deep Dive → Critical Analysis → Learning Path → Key Equations → References

How They Connect

paper-ocr-notes (outer agent)
    │
    ├── loads skill: paper-ocr-notes-pipeline (workflow orchestration)
    │
    └── calls skill: pdf-ocr-feedback (OCR engine)
            │
            └── could invoke: ocr-refiner (inner agent) for heavy OCR work

Complete Flow Diagram

┌─────────────────────────────────────────────────────────────────────────────────────────┐
│                                      USER INPUT                                         │
│                         (PDF path / paper URL / arXiv abs URL)                          │
└─────────────────────────────────────────────────────────────────────────────────────────┘
                                            │
                                            ▼
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│                              AGENT: paper-ocr-notes                                     │
│                    ~/.config/opencode/agents/paper-ocr-notes.md                         │
│                                                                                         │
│  Loads: paper-ocr-notes-pipeline (master policy)                                        │
│         pdf-ocr-feedback (OCR engine)                                                   │
└─────────────────────────────────────────────────────────────────────────────────────────┘
                                            │
                ┌───────────────────────────┴───────────────────────────┐
                │                                                       │
                ▼                                                       │
┌───────────────────────────────────┐                                   │
│     PHASE 0: INTAKE & SETUP       │                                   │
├───────────────────────────────────┤                                   │
│ • Resolve identity (title,        │                                   │
│   authors, arXiv/DOI)             │                                   │
│ • Create output folder:           │                                   │
│   <root>/<paper-slug>/            │                                   │
│ • Store source PDF                │                                   │
│ • Init notes.md skeleton          │                                   │
│ • Create paper-local README.md    │                                   │
└───────────────────────────────────┘                                   │
                │                                                       │
                ▼                                                       │
┌───────────────────────────────────────────────────────────────────────┴─────────────────┐
│                        PHASE 1: OCR FEEDBACK LOOP                                       │
│                                                                                         │
│   ┌─────────────────────────────────────────────────────────────────────────────────┐   │
│   │                      SKILL: pdf-ocr-feedback                                    │   │
│   │           ~/.config/opencode/skills/pdf-ocr-feedback/                           │   │
│   │                                                                                 │   │
│   │   ┌─────────────────────────────────────────────────────────────────────────┐   │   │
│   │   │                    (Optional) AGENT: ocr-refiner                        │   │   │
│   │   │          ~/.config/opencode/agents/ocr-refiner.md                       │   │   │
│   │   │              (Specialized OCR sub-agent if needed)                      │   │   │
│   │   └─────────────────────────────────────────────────────────────────────────┘   │   │
│   │                                                                                 │   │
│   │   FOR EACH PAGE:                                                                │   │
│   │   ┌─────────────┐                                                               │   │
│   │   │  PASS-1 OCR │ ──────► Full page transcription                               │   │
│   │   └──────┬──────┘                                                               │   │
│   │          │                                                                      │   │
│   │          ▼                                                                      │   │
│   │   ┌─────────────────────────────────────────────────────────────┐               │   │
│   │   │              SELF-EVALUATION (Evaluator Role)               │               │   │
│   │   ├─────────────────────────────────────────────────────────────┤               │   │
│   │   │  Score 0-100 across 5 dimensions:                           │               │   │
│   │   │  • Structural Fidelity     (0-25)                           │               │   │
│   │   │  • Completeness            (0-25)                           │               │   │
│   │   │  • Character/Numeric Acc   (0-20)                           │               │   │
│   │   │  • Layout-Sensitive        (0-20)                           │               │   │
│   │   │  • Noise/Garbling          (0-10)                           │               │   │
│   │   │                                                             │               │   │
│   │   │  + Spot-check 3-5 high-risk snippets                        │               │   │
│   │   │  + Check for RED FLAGS (cap at 90, force retry)             │               │   │
│   │   └──────────────────────────┬──────────────────────────────────┘               │   │
│   │                              │                                                  │   │
│   │              ┌───────────────┴───────────────┐                                  │   │
│   │              │                               │                                  │   │
│   │              ▼                               ▼                                  │   │
│   │   ┌──────────────────┐            ┌──────────────────────────────────┐          │   │
│   │   │  Score ≥ 95 &    │            │  Score < 95 OR Red Flags         │          │   │
│   │   │  No Red Flags    │            │                                  │          │   │
│   │   │                  │            │  ┌────────────────────────────┐  │          │   │
│   │   │    ✓ ACCEPT      │            │  │    MAJ@K ESCALATION        │  │          │   │
│   │   │   (cheap exit)   │            │  ├────────────────────────────┤  │          │   │
│   │   └──────────────────┘            │  │ • K=3 (default pages)      │  │          │   │
│   │                                   │  │ • K=5 (hard pages:         │  │          │   │
│   │                                   │  │   equations, tables,       │  │          │   │
│   │                                   │  │   multi-column, noisy)     │  │          │   │
│   │                                   │  │                            │  │          │   │
│   │                                   │  │ Generate K-1 additional    │  │          │   │
│   │                                   │  │ INDEPENDENT passes         │  │          │   │
│   │                                   │  └─────────────┬──────────────┘  │          │   │
│   │                                   │               │                  │          │   │
│   │                                   │               ▼                  │          │   │
│   │                                   │  ┌────────────────────────────┐  │          │   │
│   │                                   │  │  LINE-LEVEL CONSENSUS VOTE │  │          │   │
│   │                                   │  ├────────────────────────────┤  │          │   │
│   │                                   │  │ • 2+ passes agree → accept │  │          │   │
│   │                                   │  │ • Disputed spans → vote    │  │          │   │
│   │                                   │  │ • Ties → context decides   │  │          │   │
│   │                                   │  │ • No winner → mark         │  │          │   │
│   │                                   │  │   [uncertain]              │  │          │   │
│   │                                   │  └─────────────┬──────────────┘  │          │   │
│   │                                   │               │                  │          │   │
│   │                                   │               ▼                  │          │   │
│   │                                   │  ┌────────────────────────────┐  │          │   │
│   │                                   │  │   RE-EVALUATE MERGED       │  │          │   │
│   │                                   │  └─────────────┬──────────────┘  │          │   │
│   │                                   │               │                  │          │   │
│   │                                   │       ┌───────┴───────┐          │          │   │
│   │                                   │       │               │          │          │   │
│   │                                   │       ▼               ▼          │          │   │
│   │                                   │   ≥ 95?           < 95?          │          │   │
│   │                                   │     │               │            │          │   │
│   │                                   │     ▼               ▼            │          │   │
│   │                                   │  ✓ ACCEPT    ┌─────────────────┐ │          │   │
│   │                                   │              │ TARGETED SPAN   │ │          │   │
│   │                                   │              │ REPAIR          │ │          │   │
│   │                                   │              │ (only flagged   │ │          │   │
│   │                                   │              │  [uncertain]    │ │          │   │
│   │                                   │              │  regions)       │ │          │   │
│   │                                   │              └────────┬────────┘ │          │   │
│   │                                   │                       │          │          │   │
│   │                                   │                       ▼          │          │   │
│   │                                   │              ┌─────────────────┐ │          │   │
│   │                                   │              │ STOPPING CHECK  │ │          │   │
│   │                                   │              │ • ≥95 → ACCEPT  │ │          │   │
│   │                                   │              │ • <2pt gain →   │ │          │   │
│   │                                   │              │   ACCEPT w/note │ │          │   │
│   │                                   │              │ • 3 iterations  │ │          │   │
│   │                                   │              │   → hard cap    │ │          │   │
│   │                                   └──────────────┴─────────────────┴─┘          │   │
│   │                                                                                 │   │
│   │   OUTPUT: Merged pages with ===== PAGE N ===== delimiters                       │   │
│   └─────────────────────────────────────────────────────────────────────────────────┘   │
│                                                                                         │
│   ARTIFACTS PRODUCED:                                                                   │
│   • <paper>.ocr.feedback-loop.txt                                                       │
│   • <paper>.ocr.feedback-loop-comparison.md                                             │
└─────────────────────────────────────────────────────────────────────────────────────────┘
                │
                ▼
┌───────────────────────────────────┐
│   PHASE 1.5: IDENTITY GATE        │
├───────────────────────────────────┤
│ Verify anchors before trusting:   │
│ • Title matches target?           │
│ • Authors match?                  │
│ • arXiv/DOI consistent?           │
│                                   │
│ MISMATCH? → Reject → Re-extract   │
└───────────────────────────────────┘
                │
                ▼
┌───────────────────────────────────┐
│   PHASE 2: GROUND-TRUTH           │
│            EXTRACTION             │
├───────────────────────────────────┤
│ Facts only (no interpretation):   │
│ • Problem statement               │
│ • Contributions as stated         │
│ • Method overview                 │
│ • Key equations/definitions       │
│ • Experimental setup              │
│ • Quantitative results            │
│ • Stated limitations              │
└───────────────────────────────────┘
                │
                ▼
┌───────────────────────────────────────────────────────────────────────────────────────┐
│                    PHASE 3: COMPOSE notes.md                                          │
├───────────────────────────────────────────────────────────────────────────────────────┤
│                                                                                       │
│   1. # <Paper Title>                                                                  │
│   2. ## TL;DR ─────────────────── problem, key idea, 1-2 evidence numbers, caveat     │
│   3. ## Beginner's Guide ──────── college-level, symbols defined, toy example         │
│   4. ## Paper Summary ─────────── problem, contributions, results, experiments table  │
│   5. ## Deep Dive ─────────────── definitions, derivations, implementation sketch     │
│   6. ## Critical Analysis ─────── assumptions, objections, failure modes, tests       │
│   7. ## Learning Path ─────────── prereqs, next steps, diagnostics                    │
│   8. ## Key Equations Summary ─── equation + one-line meaning                         │
│   9. ## References                                                                    │
│                                                                                       │
└───────────────────────────────────────────────────────────────────────────────────────┘
                │
                ▼
┌───────────────────────────────────────────────────────────────────────────────────────┐
│                    PHASE 4: EXPLORATION & CONSISTENCY CHECKS                          │
├───────────────────────────────────────────────────────────────────────────────────────┤
│                                                                                       │
│   Problem Framing          Method Checks              Claims Checks                   │
│   ─────────────────        ─────────────              ─────────────                   │
│   • What breaks?           • What's optimized?       • Theory vs empirical?          │
│   • Boundary conditions?   • Additive vs mult?       • Evidence sources?             │
│                            • Probs vs grads?         • Missing ablations?            │
│                                                                                       │
│   Consistency Checks                                                                  │
│   ──────────────────                                                                  │
│   • Terminology consistent?  • Equations match implementation?                        │
│   • Baseline independence?   • Proxy computability?                                   │
│                                                                                       │
└───────────────────────────────────────────────────────────────────────────────────────┘
                │
                ▼
┌───────────────────────────────────────────────────────────────────────────────────────┐
│                    PHASE 5: REVISION PASSES (mandatory)                               │
├───────────────────────────────────────────────────────────────────────────────────────┤
│                                                                                       │
│   Pass 1: CLARITY              Pass 2: CORRECTNESS         Pass 3: OBJECTION TEST     │
│   ───────────────              ──────────────────          ───────────────────        │
│   • Remove duplicates          • Fix conflations           • Strongest objection?     │
│   • Defs before eqs            • Remove unsupported        • Rebuttal conditions?     │
│   • Soften overconfidence      • Fix pseudocode            • Falsifiable tests?       │
│                                                                                       │
│   Pass 4: UPDATE TL;DR CAVEAT                                                         │
│                                                                                       │
└───────────────────────────────────────────────────────────────────────────────────────┘
                │
                ▼
┌───────────────────────────────────────────────────────────────────────────────────────┐
│                    PHASE 6: QUALITY GATES (must pass all)                             │
├───────────────────────────────────────────────────────────────────────────────────────┤
│                                                                                       │
│   ☐ Accuracy ─────────── symbols/claims correct, evidence-backed                      │
│   ☐ Completeness ─────── experiments table has tasks, baselines, metrics              │
│   ☐ Teachability ─────── beginner section understandable                              │
│   ☐ Critical Rigor ───── ≥1 strong objection + test plan                              │
│   ☐ Implementation ───── pseudocode coherent                                          │
│                                                                                       │
└───────────────────────────────────────────────────────────────────────────────────────┘
                │
                ▼
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│                                    FINAL OUTPUT                                         │
│                              <output-root>/<paper-slug>/                                │
├─────────────────────────────────────────────────────────────────────────────────────────┤
│                                                                                         │
│   📄 <paper>.pdf ──────────────────────────── source file                               │
│   📄 notes.md ─────────────────────────────── PRIMARY DELIVERABLE                       │
│   📄 README.md ────────────────────────────── provenance + artifact index               │
│   📄 <paper>.ocr.feedback-loop.txt ────────── final OCR output                          │
│   📄 <paper>.ocr.feedback-loop-comparison.md  quality/replacement report                │
│                                                                                         │
└─────────────────────────────────────────────────────────────────────────────────────────┘

Design Philosophy

Bitter Lesson Alignment

What's aligned:

Maj@K voting — uses scale (more passes) to improve quality
Self-evaluation loop — model judges itself rather than external rules
Targeted repair — focuses compute where it's needed
Stopping on diminishing returns — adaptive compute budget

What's template-driven (but justified):

The 9-section notes.md structure encodes how humans want to consume dense information, not domain-specific knowledge
This is audience constraint, not model constraint — "explain simply before explaining deeply" is pedagogy, not a heuristic that scale will obsolete

Hardcoded Elements (acknowledged trade-offs)

Element	Value	Why It Exists
K values	3 or 5	Compute budget heuristic
Score threshold	95	Quality bar
Rubric weights	25/25/20/20/10	Human-intuited importance
Hard page detection	equations, 3+ cols, etc.	Content heuristic
Max iterations	3 per page, 5 global	Compute cap

These could be made adaptive with more engineering, but the current values work well in practice.

tokenbender/paper-ocr-system.md

Select an option

No results found

Select an option

No results found

Paper OCR System - Complete Architecture

Overview

Location

Two Agents

1. `paper-ocr-notes` (End-to-End Paper Processor)

2. `ocr-refiner` (OCR Specialist)

Two Skills

1. `pdf-ocr-feedback` (OCR Engine)

2. `paper-ocr-notes-pipeline` (Full Workflow)

How They Connect

Complete Flow Diagram

Design Philosophy

Bitter Lesson Alignment

Hardcoded Elements (acknowledged trade-offs)

tokenbender/paper-ocr-system.md

Paper OCR System - Complete Architecture

Overview

Location

Two Agents

1. paper-ocr-notes (End-to-End Paper Processor)

2. ocr-refiner (OCR Specialist)

Two Skills

1. pdf-ocr-feedback (OCR Engine)

2. paper-ocr-notes-pipeline (Full Workflow)

How They Connect

Complete Flow Diagram

Design Philosophy

Bitter Lesson Alignment

Hardcoded Elements (acknowledged trade-offs)

1. `paper-ocr-notes` (End-to-End Paper Processor)

2. `ocr-refiner` (OCR Specialist)

1. `pdf-ocr-feedback` (OCR Engine)

2. `paper-ocr-notes-pipeline` (Full Workflow)