Last active
April 7, 2026 03:56
-
-
Save kltng/67143ae32f6a28798f010bdbac127224 to your computer and use it in GitHub Desktop.
llm-wiki-report.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # The LLM Wiki Movement: Coding Agents Meet Knowledge Management | |
| ## An In-Depth Research Report | |
| **Date**: April 6, 2026 | |
| **Subject**: The convergence of AI coding agents and Obsidian for personal knowledge management, catalyzed by Andrej Karpathy's LLM Wiki pattern | |
| --- | |
| ## Table of Contents | |
| 1. [Executive Summary](#executive-summary) | |
| 2. [Origin Story: From Vibe Coding to Vibe Knowledge](#origin-story) | |
| 3. [Karpathy's LLM Wiki: The Core Pattern](#the-core-pattern) | |
| 4. [Architecture Deep Dive](#architecture-deep-dive) | |
| 5. [The Ecosystem: Tools, Plugins, and Implementations](#ecosystem) | |
| 6. [Community Implementations and Forks](#community-implementations) | |
| 7. [The Obsidian Connection: Why This Tool Won](#obsidian-connection) | |
| 8. [Community Voices and Notable Reactions](#community-voices) | |
| 9. [Related Concepts and Prior Art](#prior-art) | |
| 10. [Critical Analysis and Limitations](#critical-analysis) | |
| 11. [Future Directions](#future-directions) | |
| 12. [Sources and References](#sources) | |
| --- | |
| ## 1. Executive Summary {#executive-summary} | |
| In early April 2026, Andrej Karpathy — co-founder of OpenAI, former AI Director at Tesla, and the person who coined "vibe coding" — published a GitHub Gist titled [llm-wiki](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f) that ignited a movement. Within 48 hours, the gist had 5,000+ stars and 1,500+ forks. The idea: instead of using LLMs as search engines over your documents (RAG), have them **incrementally build and maintain a persistent, interlinked wiki** — a structured collection of markdown files that compounds knowledge over time. | |
| This report documents the full scope of this movement: the original pattern, the explosion of tools and implementations it spawned, the convergence with Obsidian's ecosystem, the philosophical underpinnings, and the critical analysis of what this approach does and doesn't solve. | |
| **Key finding**: We are witnessing a paradigm shift where AI coding agents — originally built for software development (Claude Code, OpenAI Codex, Cursor) — are being repurposed as **knowledge engineers**. The same agents that write code now write and maintain wikis. Obsidian, with its local-first markdown architecture, has emerged as the dominant "IDE" for this new workflow. The trend has been described as moving from "vibe coding" to "agentic engineering" applied to knowledge rather than code. | |
| --- | |
| ## 2. Origin Story: From Vibe Coding to Vibe Knowledge {#origin-story} | |
| ### The Karpathy Arc | |
| Andrej Karpathy has been at the center of several defining moments in AI culture: | |
| - **February 2025**: Coins the term **"vibe coding"** — using LLMs to generate code by describing what you want rather than writing it yourself. The term goes viral and enters mainstream vocabulary. | |
| - **2025-2026**: Vibe coding matures. Karpathy later declares it "passé," replaced by what he calls **"agentic engineering"** — orchestrating AI agents rather than writing code directly. ("Agentic because you are not writing the code directly 99% of the time, you are orchestrating agents who do. Engineering to emphasize that there is an art & science and expertise to it.") | |
| - **April 2, 2026**: Karpathy tweets about a shift in how he uses LLMs — from generating code to **generating knowledge structure**. The tweet goes massively viral. | |
| - **April 4, 2026**: Publishes the [llm-wiki gist](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f), which he describes as an "idea file" — not code to run, but a pattern to share with your LLM agent. | |
| ### The "Idea File" as a New Format | |
| Rather than publishing a library, Karpathy introduced a new format: the **idea file**. It's an abstract pattern, written in markdown, designed to be copy-pasted into an AI coding agent's context (via CLAUDE.md, AGENTS.md, etc.). The agent then instantiates a specific implementation tailored to the user's environment, preferences, and domain. | |
| This itself represents a cultural shift: from "open source" (share code) to **"open ideas"** (share patterns that agents implement). As Karpathy wrote: "This document is intentionally abstract. It describes the idea, not a specific implementation. The right way to use this is to share it with your LLM agent and work together to instantiate a version that fits your needs." | |
| ### Parallel Evolution: Vibe Researching | |
| Independently, OpenAI's Chief Scientist Jakub Pachocki and Chief Research Officer Mark Chen have discussed **"vibe researching"** — a co-research approach where research is conducted collaboratively with AI. The LLM Wiki pattern can be seen as the practical infrastructure for vibe researching: the wiki is where the collaboration's output accumulates. | |
| --- | |
| ## 3. Karpathy's LLM Wiki: The Core Pattern {#the-core-pattern} | |
| ### The Problem with RAG | |
| Most people's experience with LLMs and documents looks like Retrieval-Augmented Generation (RAG): upload files, the LLM retrieves relevant chunks at query time, generates an answer. This works, but: | |
| - The LLM **rediscovers knowledge from scratch on every question** | |
| - There's **no accumulation** — ask a subtle question requiring synthesis of five documents, and the LLM must find and piece together fragments every time | |
| - Valuable analysis **disappears into chat history** | |
| - Contradictions between sources **go unnoticed** | |
| - Cross-references are **ad-hoc**, discovered at query time | |
| NotebookLM, ChatGPT file uploads, and most RAG systems work this way. | |
| ### The Wiki Alternative | |
| The LLM Wiki inverts this: | |
| > "Instead of just retrieving from raw documents at query time, the LLM **incrementally builds and maintains a persistent wiki** — a structured, interlinked collection of markdown files that sits between you and the raw sources." | |
| When you add a new source: | |
| - The LLM reads it | |
| - Extracts key information | |
| - **Integrates it into the existing wiki** — updating entity pages, revising topic summaries, noting contradictions, strengthening or challenging the evolving synthesis | |
| - Knowledge is **compiled once and kept current**, not re-derived on every query | |
| ### The Key Metaphor | |
| Karpathy's central metaphor: | |
| > "Obsidian is the IDE; the LLM is the programmer; the wiki is the codebase." | |
| The human's job: curate sources, direct analysis, ask good questions, think about meaning. | |
| The LLM's job: everything else — summarizing, cross-referencing, filing, bookkeeping. | |
| ### Why It Works | |
| > "The tedious part of maintaining a knowledge base is not the reading or the thinking — it's the bookkeeping. Updating cross-references, keeping summaries current, noting when new data contradicts old claims, maintaining consistency across dozens of pages. Humans abandon wikis because the maintenance burden grows faster than the value. LLMs don't get bored, don't forget to update a cross-reference, and can touch 15 files in one pass." | |
| --- | |
| ## 4. Architecture Deep Dive {#architecture-deep-dive} | |
| ### Three Layers | |
| ``` | |
| ┌─────────────────────────────────────────────┐ | |
| │ THE SCHEMA │ | |
| │ (CLAUDE.md / AGENTS.md / GEMINI.md) │ | |
| │ Conventions, workflows, page formats │ | |
| │ Co-evolved between human and LLM │ | |
| ├─────────────────────────────────────────────┤ | |
| │ THE WIKI │ | |
| │ wiki/ │ | |
| │ ├── index.md (content catalog) │ | |
| │ ├── log.md (chronological record) │ | |
| │ ├── sources/ (per-source summaries) │ | |
| │ ├── concepts/ (topic articles) │ | |
| │ ├── entities/ (people, orgs, tools) │ | |
| │ ├── comparisons/ (cross-cutting analyses) │ | |
| │ └── overview.md (synthesis) │ | |
| │ LLM-generated, LLM-maintained │ | |
| ├─────────────────────────────────────────────┤ | |
| │ RAW SOURCES │ | |
| │ raw/ │ | |
| │ ├── articles/ (web-clipped content) │ | |
| │ ├── papers/ (academic papers) │ | |
| │ ├── images/ (diagrams, figures) │ | |
| │ └── assets/ (downloaded attachments) │ | |
| │ Immutable — LLM reads, never modifies │ | |
| └─────────────────────────────────────────────┘ | |
| ``` | |
| **Layer 1 — Raw Sources**: Immutable collection of articles, papers, datasets, images. The LLM reads from these but never modifies them. This is the source of truth. | |
| **Layer 2 — The Wiki**: LLM-generated markdown files forming a structured knowledge base. Summaries, entity pages, concept pages, comparisons, an overview, a synthesis. The LLM owns this layer entirely. It creates pages, updates them when new sources arrive, maintains cross-references, and keeps everything consistent. | |
| **Layer 3 — The Schema**: A configuration document telling the LLM how the wiki is structured, what conventions to follow, what workflows to execute. This is what makes the LLM a disciplined wiki maintainer rather than a generic chatbot. Human and LLM co-evolve this over time. | |
| ### Three Core Operations | |
| **Ingest** — User drops a new source into `raw/` and tells the LLM to process it. The LLM reads the source, discusses key takeaways, writes a summary page, updates the index, updates relevant entity and concept pages across the wiki, and appends an entry to the log. A single source might touch 10-15 wiki pages. | |
| **Query** — User asks questions against the wiki. The LLM reads the index to find relevant pages, reads them, and synthesizes an answer with citations. Valuable outputs get filed back as new wiki pages — comparisons, analyses, connections — so explorations compound rather than disappearing into chat history. | |
| **Lint** — Periodic health checks identifying: contradictions between pages, stale claims superseded by newer sources, orphan pages with no inbound links, important concepts mentioned but lacking their own page, missing cross-references, data gaps that could be filled. | |
| ### Two Key Index Files | |
| **index.md** — Content-oriented. Catalogs every page with a link, one-line summary, and metadata. Organized by category. The LLM reads this first when answering queries. Works surprisingly well at moderate scale (~100 sources, hundreds of pages) without embedding-based RAG infrastructure. | |
| **log.md** — Chronological. Append-only record of what happened and when. Consistent prefix format (e.g., `## [2026-04-02] ingest | Article Title`) makes it parseable with Unix tools. Provides timeline context for new sessions. | |
| ### Comparison: Wiki vs. RAG | |
| | Aspect | RAG | LLM Wiki | | |
| |--------|-----|----------| | |
| | Processing timing | Query time (repeated) | Ingest time (once) | | |
| | Cross-references | Ad-hoc discovery | Pre-built, maintained | | |
| | Contradictions | May go unnoticed | Flagged during ingestion | | |
| | Output | Ephemeral chat | Persistent markdown | | |
| | Maintenance | System (opaque) | LLM (transparent) | | |
| | Compounding | None | Every source enriches structure | | |
| | Scale requirement | Embedding infrastructure | index.md + optional search | | |
| | Human readability | Low (chunks) | High (wiki pages) | | |
| --- | |
| ## 5. The Ecosystem: Tools, Plugins, and Implementations {#ecosystem} | |
| ### 5.1 Core Tool Stack (Karpathy's Recommendation) | |
| **Obsidian** — The "IDE" for browsing the wiki. Local-first, markdown-native, with powerful graph visualization, wikilinks, and extensibility. | |
| **Obsidian Web Clipper** — Browser extension converting web articles to markdown for ingestion into `raw/`. | |
| **qmd** (by Tobi Lutke, Shopify founder) — Local search engine for markdown files with hybrid BM25/vector search and LLM re-ranking, all on-device. Both CLI (for agent shell-out) and MCP server (for native tool use). [github.com/tobi/qmd](https://github.com/tobi/qmd) | |
| **Marp** — Markdown-based slide deck format. Obsidian has a plugin. Generate presentations directly from wiki content. | |
| **Dataview** — Obsidian plugin that runs queries over page frontmatter (YAML metadata — tags, dates, source counts) to generate dynamic tables and lists. | |
| **Git** — The wiki is just a repo of markdown files. Version history, branching, and collaboration for free. | |
| ### 5.2 Obsidian Skills (Official, by Steph Ango / kepano) | |
| The CEO of Obsidian, Steph Ango (kepano), released **[obsidian-skills](https://github.com/kepano/obsidian-skills)** — the first Agent Skills implementation officially maintained by a mainstream tool. 13,900+ GitHub stars. | |
| Five skills teaching AI agents native Obsidian operation: | |
| | Skill | Purpose | | |
| |-------|---------| | |
| | **obsidian-markdown** | Wikilinks, callouts, YAML frontmatter, tags, embeds | | |
| | **obsidian-bases** | Structured databases with typed properties, filters, views | | |
| | **json-canvas** | Visual whiteboards with nodes, edges, groups | | |
| | **obsidian-cli** | Terminal-based vault management without GUI | | |
| | **defuddle** | Web page → clean structured Markdown conversion | | |
| Installation: clone repo, copy skill files to `.claude/skills/` (or agent equivalent), point agent at vault. | |
| ### 5.3 Claudian (Claude Code Inside Obsidian) | |
| **[Claudian](https://github.com/YishenTu/claudian)** — An Obsidian plugin (4,500+ stars) that embeds the full Claude Code CLI as a sidebar chat panel inside Obsidian, with the vault as the working directory. | |
| Key features: | |
| - One-click Claude Code launch, no terminal needed | |
| - Multiple conversation tabs for parallel work | |
| - Full agentic capabilities (read, write, edit, search, bash) within the vault | |
| - Inline edit with word-level diff preview | |
| - 1M context window support (Claude Max) | |
| This represents the deepest integration: Claude Code running *inside* Obsidian rather than alongside it. | |
| ### 5.4 Obsidian MCP Servers | |
| Multiple MCP (Model Context Protocol) servers bridge AI agents and Obsidian vaults: | |
| **[cyanheads/obsidian-mcp-server](https://github.com/cyanheads/obsidian-mcp-server)** — Comprehensive suite: read, write, search, manage notes/tags/frontmatter. Requires Obsidian Local REST API plugin. | |
| **MCPVault (@bitbonsai/mcpvault)** — Tags scanning, CLI integration for active files, daily notes, backlinks. v0.11.0 (March 2026). | |
| **[aaronsb/obsidian-mcp-plugin](https://github.com/aaronsb/obsidian-mcp-plugin)** — Direct vault access through semantic operations and HTTP transport. | |
| **Obsidian Sync MCP** — Works locally or via Self-hosted LiveSync for cloud deployment. | |
| Two architectural camps exist: | |
| 1. **REST API servers** — require Obsidian Local REST API plugin and Obsidian running | |
| 2. **Filesystem servers** — read markdown directly from disk, no Obsidian dependency | |
| ### 5.5 Obsilo Agent | |
| **[Obsilo](https://www.obsilo.ai/)** (by Sebastian Hanke) — An ambitious Obsidian plugin with: | |
| - 55+ tools | |
| - Hybrid semantic search | |
| - 3-tier memory system | |
| - Knowledge graph | |
| - MCP connectors | |
| - Multi-agent workflows | |
| - Plugin discovery (plugins as skills) | |
| - Office document creation | |
| - Safety controls | |
| Open source, free, local-first. Positioned as an autonomous operating layer rather than a chatbot sidebar. | |
| ### 5.6 qmd Ecosystem | |
| **[qmd](https://github.com/tobi/qmd)** by Tobi Lutke has spawned its own ecosystem: | |
| - **[obsidian-qmd](https://github.com/achekulaev/obsidian-qmd)** — Local semantic search plugin for Obsidian powered by qmd | |
| - **[qmd-search-obsidian](https://github.com/quakeboy/qmd-search-obsidian)** — Integrates qmd into native Obsidian search modal | |
| - **[lazyqmd](https://alexanderzeitler.com/articles/introducing-lazyqmd-a-tui-for-qmd/)** — TUI (terminal UI) for qmd | |
| - **qmd-markdown-search skill** — Agent skill for qmd integration | |
| One user reported saving **96% on tokens** by using qmd instead of raw file grepping: "I have an Obsidian vault with 600+ notes. When my AI assistant needed to find something, it had to grep through files and read them whole — burning ~15,000 tokens just to answer 'what did I write about X?'" | |
| ### 5.7 Agent Configuration Standards | |
| The pattern works across multiple agent platforms via their respective configuration files: | |
| | Agent | Config File | Ecosystem | | |
| |-------|-------------|-----------| | |
| | Claude Code | `CLAUDE.md` | Anthropic | | |
| | OpenAI Codex | `AGENTS.md` | OpenAI | | |
| | Cursor | `.cursor/rules` | Cursor | | |
| | Windsurf | `.windsurf/rules` | Codeium | | |
| | Google Gemini | `GEMINI.md` | Google | | |
| **AGENTS.md** has emerged as an open standard, stewarded by the Agentic AI Foundation under the Linux Foundation, with contributions from OpenAI, Amp, Jules (Google), Cursor, and Factory. | |
| --- | |
| ## 6. Community Implementations and Forks {#community-implementations} | |
| Within days of Karpathy's gist, multiple implementations appeared: | |
| ### 6.1 obsidian-wiki (by Ar9av) | |
| **[github.com/Ar9av/obsidian-wiki](https://github.com/Ar9av/obsidian-wiki)** — The most comprehensive community implementation. A framework of 13 agent skills for building and maintaining Obsidian wikis. | |
| **Key innovations beyond Karpathy's pattern**: | |
| - **Delta tracking**: Manifest file monitors ingested sources (paths, timestamps, produced pages), processing only new/changed content | |
| - **Multi-agent support**: Works with Claude Code, Cursor, Windsurf, Codex, Google Antigravity, GitHub Copilot | |
| - **Four-stage ingest**: Ingest → Extract → Resolve → Schema (schema emerges from sources rather than being pre-defined) | |
| - **Archive & rebuild**: Timestamped snapshots for vault restoration | |
| - **Cross-linking automation**: Post-ingest scanning weaves unlinked mentions into wikilinks | |
| - **Tag taxonomy**: Controlled vocabulary system in `_meta/taxonomy.md` | |
| - **Claude conversation mining**: `/claude-history-ingest` command extracts knowledge from past Claude conversations | |
| 13 available slash commands including `/wiki-ingest`, `/wiki-lint`, `/wiki-query`, `/cross-linker`, `/tag-taxonomy`, `/wiki-rebuild`. | |
| ### 6.2 llm-knowledge-bases (by rvk7895) | |
| **[github.com/rvk7895/llm-knowledge-bases](https://github.com/rvk7895/llm-knowledge-bases)** — A Claude Code plugin with three tiers of query sophistication: | |
| - **Quick**: Responses from wiki indexes and summaries | |
| - **Standard**: Cross-references full wiki plus web search | |
| - **Deep**: Multi-agent research pipeline with parallel search agents | |
| Also adds: `/research <topic>` for structured research outlines, `/research-deep` for parallel agent research, and output generation (markdown reports, Marp slides, matplotlib charts saved to `output/`). | |
| ### 6.3 llm-wiki-compiler (by ussumant) | |
| **[github.com/ussumant/llm-wiki-compiler](https://github.com/ussumant/llm-wiki-compiler)** — Focused Claude Code plugin that reads all sources and creates topic articles synthesizing everything known about each subject with backlinks to sources. | |
| ### 6.4 kb-template (by jeremyrayner) | |
| **[github.com/jeremyrayner/kb-template](https://github.com/jeremyrayner/kb-template)** — A ready-to-fork template repository. Emphasizes: | |
| - **Explicit**: Knowledge stored as navigable wiki — inspect exactly what the AI knows | |
| - **Yours**: Everything lives locally as plain files, no vendor lock-in | |
| - **File over app**: Universal markdown, compatible with any tool | |
| - **BYOAI**: Works with Claude Code, Codex, or any agent | |
| Pre-configured Obsidian vault with reusable prompt templates in `_prompts/`. | |
| ### 6.5 karpathy-llm-wiki (by Astro-Han) | |
| **[github.com/Astro-Han/karpathy-llm-wiki](https://github.com/Astro-Han/karpathy-llm-wiki)** — A single skill that packages the entire Karpathy LLM Wiki pattern for drop-in use. | |
| ### 6.6 Additional Forks | |
| | Project | URL | Description | | |
| |---------|-----|-------------| | |
| | llm-wiki (hellohejinyu) | [GitHub](https://github.com/hellohejinyu/llm-wiki) | LLM-powered personal wiki CLI with multi-provider support | | |
| | LLM-wiki (Ss1024sS) | [GitHub](https://github.com/Ss1024sS/LLM-wiki) | Direct implementation of Karpathy's gist pattern | | |
| ### 6.7 The Broader "AI Second Brain" Ecosystem | |
| The LLM Wiki trend is converging with the broader "Second Brain" movement: | |
| - **[Forte Labs AI Second Brain](https://fortelabs.com/blog/introducing-the-ai-second-brain/)** — Tiago Forte (author of "Building a Second Brain") now integrating AI agents into his methodology | |
| - **[second-brain-agent](https://github.com/flepied/second-brain-agent)** — Open-source AI agent for second brain workflows | |
| - **[Second Brain I/O](https://second-brain.io/)** — AI-powered PKM with "supermemory" for AI agents | |
| - **[AFFiNE](https://affine.pro/blog/build-ai-second-brain)** — Alternative knowledge tool building AI second brain features | |
| - **[Copilot for Obsidian](https://www.obsidiancopilot.com/en)** — Leading in-vault AI assistant with 100,000+ users, model-agnostic (OpenAI, Anthropic, Google, Ollama), RAG-based vault QA, and Plus tier with autonomous agent capabilities | |
| The framing is shifting from "Personal Knowledge Management" to **"Personal Context Management"** — the bottleneck is no longer AI capability but your ability to give AI the right information at the right time. | |
| --- | |
| ## 7. The Obsidian Connection: Why This Tool Won {#obsidian-connection} | |
| ### Why Obsidian Specifically? | |
| Several properties make Obsidian the natural fit for LLM-maintained wikis: | |
| 1. **Local-first, plain markdown**: Files live on disk. Any tool can read/write them — editors, git, grep, AI agents. No proprietary format, no API needed. | |
| 2. **Wikilinks (`[[page]]`)**: Native bidirectional linking is the core primitive for knowledge graphs. LLMs can easily generate and maintain wikilinks. | |
| 3. **Graph view**: Visual representation of the wiki's structure — hubs, orphans, clusters. Essential for humans to understand what the LLM has built. | |
| 4. **YAML frontmatter**: Structured metadata on every page. LLMs add tags, dates, source counts, confidence levels. Dataview plugin queries this metadata dynamically. | |
| 5. **Plugin ecosystem**: Web Clipper (ingest), Marp (slides), Dataview (queries), Excalidraw (diagrams), Breadcrumbs (navigation). | |
| 6. **No vendor lock-in**: Switch between Claude, GPT, Gemini, or local models without touching the wiki. The markdown is universal. | |
| 7. **Community alignment**: Obsidian's community values local-first, privacy-first, plain-text principles — naturally aligned with the LLM Wiki ethos. | |
| ### The Obsidian CEO's Move | |
| Steph Ango's decision to release official Obsidian Skills signals institutional recognition: the future of Obsidian includes AI agents as first-class users. This is significant — rather than building AI features into Obsidian itself, the approach is to teach AI agents to use Obsidian natively. | |
| ### The "IDE for Knowledge" Metaphor | |
| The Karpathy metaphor — "Obsidian is the IDE; the LLM is the programmer; the wiki is the codebase" — reframes knowledge management in terms software developers already understand: | |
| | Coding | Knowledge | | |
| |--------|-----------| | |
| | VS Code / IDE | Obsidian | | |
| | Developer | LLM Agent | | |
| | Codebase | Wiki | | |
| | Git commits | Log entries | | |
| | Linting | Wiki health checks | | |
| | Code review | Human reading wiki updates | | |
| | CLAUDE.md / AGENTS.md | Schema (same file!) | | |
| This is why **coding agents** specifically are repurposed: Claude Code, Codex, and Cursor already know how to read files, edit files, maintain cross-references, follow conventions in config files, and run health checks. These are the exact same skills needed for wiki maintenance. | |
| ### The "Contamination Mitigation" Pattern | |
| Steph Ango (Obsidian CEO) suggested a crucial operational pattern: maintain a **high-signal personal vault** alongside an agent-facing **"messy" vault**. This prevents LLM-generated content from contaminating hand-curated notes. The clean vault remains human-authored; the agent vault is where LLMs operate freely. Cross-referencing happens via links, not co-location. | |
| ### Karpathy's Scale Report | |
| At current usage, Karpathy reports managing ~100 articles and ~400,000 words through this system. At this scale, the LLM's ability to navigate via summaries and index files is "more than sufficient" — and as he notes, "fancy RAG infrastructure often introduces more latency and retrieval noise than it solves at this scale." | |
| --- | |
| ## 8. Community Voices and Notable Reactions {#community-voices} | |
| ### Hacker News Discussions | |
| Two major HN threads emerged: | |
| - [LLM Wiki — example of an "idea file"](https://news.ycombinator.com/item?id=47640875) — discussion of the original gist | |
| - [Show HN: LLM Wiki — Open-Source Implementation](https://news.ycombinator.com/item?id=47656181) — community implementation thread | |
| ### Notable Quotes from the Community | |
| **Charly Wargnier**: "It acts as a living AI knowledge base that actually heals itself." | |
| **Vamshi Reddy**: "Every business has a raw/ directory. Nobody's ever compiled it. That's the product." | |
| **Jason Paul Michaels**: "No vector database. No embeddings... Just markdown, FTS5, and grep..." | |
| **Lex Fridman**: Described generating dynamic HTML with JS for sort/filter/visualization, and generating "temporary focused mini-knowledge-bases" loaded into LLMs for voice-mode interaction during 7-10 mile runs — using the wiki as podcast research infrastructure. | |
| **Andrew Levine** (on qmd): "Holy crap. qmd by @tobi saved me 96% on tokens with clawdbot. I have an Obsidian vault with 600+ notes. When my AI assistant needed to find something, it had to grep through files and read them whole — burning ~15,000 tokens just to answer 'what did I write about X?'" | |
| ### The Paradigm Shift Framing | |
| The community framing has crystallized around a core insight: **coding agents are evolving from code generators into knowledge compilers**. The same tools built for software engineering (file reading, editing, cross-referencing, linting, version control) turn out to be exactly what's needed for knowledge management. | |
| This is described as a shift from **"query-time retrieval" (RAG) to "compile-time synthesis"** — the LLM pre-processes and structures knowledge rather than searching it on demand. | |
| --- | |
| ## 9. Related Concepts and Prior Art {#prior-art} | |
| ### Vannevar Bush's Memex → Luhmann's Zettelkasten → LLM Wiki | |
| The genealogy is clear: Bush envisioned associative knowledge trails (1945), Luhmann built a partial manual version across 40 years (~90,000 interlinked notes), and Karpathy automates the maintenance burden that defeated all previous attempts. | |
| ### Vannevar Bush's Memex (1945) | |
| Karpathy explicitly traces his pattern to Vannevar Bush's 1945 essay "As We May Think," which described the **Memex** — a personal, curated knowledge store with associative trails between documents. Bush's vision was private, actively curated, with connections as valuable as documents themselves. The unsolved problem: who does the maintenance? LLMs now handle that. | |
| ### Niklas Luhmann's Zettelkasten | |
| The German sociologist Niklas Luhmann maintained a physical slip-box (Zettelkasten) of ~90,000 interlinked notes over 40 years, producing 70+ books. The Zettelkasten method emphasizes: | |
| - Atomic notes (one idea per note) | |
| - Permanent links between notes | |
| - Personal reformulation (writing in your own words) | |
| The LLM Wiki automates the linking and maintenance but raises a question: is the personal reformulation step — which Luhmann considered essential for thinking — lost when the LLM writes the notes? | |
| ### The Extended Brain Critique | |
| The Substack essay "The Wiki That Writes Itself" highlights this tension: "Curating sources and asking questions differ substantially from the act of writing that discovers thinking through formulation itself. Luhmann's insight — that friction in personal reformulation isn't inefficiency but cognitive mechanism — remains untouched by automation." | |
| **Recommended hybrid**: Use Karpathy's system for reconnaissance and domain-mapping. Transition to Luhmann-like synthesis when developing arguments, where personal reformulation is essential. "The AI compiles territory; the writer must walk it." | |
| ### NotebookLM and ChatGPT | |
| Google's NotebookLM and ChatGPT's file upload features represent the RAG approach the LLM Wiki aims to surpass. These tools re-derive knowledge on every query rather than building persistent structure. However, they have lower setup cost and work out-of-the-box. | |
| ### GraphRAG | |
| Microsoft's GraphRAG extracts relational structure for machine-oriented retrieval, prioritizing sophisticated querying over human readability. **Complementary, not competing** — GraphRAG and LLM Wiki could coexist, with GraphRAG powering the retrieval layer while the wiki remains the human-facing interface. | |
| ### The ".brain" Pattern | |
| Community contributors have described `.brain` folders for persistent project memory — a lightweight version of the wiki pattern focused on a single project's context rather than broad knowledge management. | |
| --- | |
| ## 10. Critical Analysis and Limitations {#critical-analysis} | |
| ### What This Pattern Solves | |
| 1. **The maintenance problem**: LLMs eliminate the bookkeeping burden that causes humans to abandon wikis | |
| 2. **Knowledge compounding**: Every source and query enriches the structure rather than disappearing | |
| 3. **Cross-reference consistency**: LLMs can touch 15 files in one pass, keeping references current | |
| 4. **Contradiction detection**: New sources are checked against existing claims during ingest | |
| 5. **Context persistence**: No more "lobotomy" when ending a session — the wiki preserves everything | |
| ### What It Doesn't Solve | |
| 1. **Original synthesis**: A perfectly maintained wiki doesn't automatically produce original arguments. The compilation step organizes knowledge; the creative synthesis still requires human thought. | |
| 2. **Hallucination risk**: LLM-generated wiki pages can contain confident-sounding but incorrect summaries. Bad connections look plausible despite lacking justification. Without careful review, hallucinations can "harden into apparent fact." | |
| 3. **Authority creep**: Wiki pages may be treated as authoritative when they're LLM interpretations of sources. The raw sources must remain the ground truth. | |
| 4. **Scale limits**: The index.md approach works at ~100 sources / hundreds of pages. Beyond that, you need proper search infrastructure (qmd, embeddings, GraphRAG). | |
| 5. **Cost**: Active wiki maintenance with frontier models isn't free. Each ingest touching 10-15 pages consumes significant tokens. One user reported spending 96% fewer tokens after adding qmd — the default workflow can be expensive. | |
| 6. **Review burden**: Someone still needs to read what the LLM wrote and verify accuracy. The maintenance burden shifts from "writing" to "reviewing" — less tedious, but not zero. | |
| ### Implementation Safeguards | |
| Essential boundaries to prevent problems: | |
| - Raw sources remain **read-only** (LLM never modifies them) | |
| - Important wiki changes require **human review** | |
| - Sensitive pages need **approval gates** | |
| - Uncertain claims stay marked as **drafts** | |
| - Destructive actions **never automate** | |
| - Frontmatter tracks status, ownership, source counts, review dates | |
| --- | |
| ## 11. Future Directions {#future-directions} | |
| ### Tool Convergence | |
| The ecosystem is converging around a standard stack: Obsidian + AI Agent + MCP Server + qmd + Git. Expect tighter integration — possibly Obsidian-native support for agent-maintained wikis, or dedicated "knowledge agent" products built on this pattern. | |
| ### Multi-Agent Wikis and the "Swarm Knowledge Base" | |
| Team wikis maintained by multiple LLM agents are already emerging. @jumperz (founder of Secondmate) created a **"Swarm Knowledge Base"** scaling to a 10-agent system managed via OpenClaw. A dedicated "Quality Gate" agent using the Hermes model (Nous Research) validates every draft article before promotion to the live wiki, creating a **"Compound Loop"** — the wiki gets richer, which makes the agents smarter, which makes the wiki richer. | |
| This points toward production-grade systems: multiple agents specializing in different aspects (ingest, synthesis, quality, cross-referencing), with human oversight at the approval stage rather than the writing stage. | |
| ### Knowledge as Training Data | |
| As wikis mature, they become high-quality, structured datasets. Karpathy notes that "synthetic data generation and fine-tuning become natural extensions" — you could fine-tune a model on your personal wiki for better domain performance. | |
| ### The "Agentic Engineering" of Knowledge | |
| Just as "agentic engineering" replaced "vibe coding" for software development, we may see a parallel evolution for knowledge work: from casual ChatGPT conversations to structured, agent-maintained knowledge systems. The wiki pattern is the first concrete implementation of this vision. | |
| ### Product Opportunities | |
| The current approach requires significant manual setup. Purpose-built products could: package the pattern as a one-click solution, add multi-user collaboration, integrate real-time source feeds (RSS, social media, research alerts), provide visual dashboards for wiki health, and manage costs through intelligent batching. | |
| --- | |
| ## 12. Sources and References {#sources} | |
| ### Primary Source | |
| - [Karpathy's llm-wiki gist](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f) — The original idea file (April 4, 2026) | |
| ### News and Analysis | |
| - [VentureBeat — "Karpathy shares 'LLM Knowledge Base' architecture that bypasses RAG"](https://venturebeat.com/data/karpathy-shares-llm-knowledge-base-architecture-that-bypasses-rag-with-an) | |
| - [Analytics India Magazine — "Andrej Karpathy Moves Beyond RAG"](https://analyticsindiamag.com/ai-news/andrej-karpathy-moves-beyond-rag-builds-llm-powered-personal-knowledge-bases) | |
| - [MindStudio — "What Is Andrej Karpathy's LLM Wiki?"](https://www.mindstudio.ai/blog/andrej-karpathy-llm-wiki-knowledge-base-claude-code) | |
| - [Analytics Drift — "How AI Is Replacing Personal Note-Taking"](https://analyticsdrift.com/karpathy-llm-knowledge-base-vibe-coding-workflow/) | |
| - [Startup Fortune — "LLM Wiki, a Living Archive for AI Ideas"](https://startupfortune.com/andrej-karpathy-unveils-llm-wiki-a-living-archive-for-ai-ideas/) | |
| - [The New Stack — "Vibe Coding is Passé"](https://thenewstack.io/vibe-coding-is-passe/) | |
| - [Antigravity Codes — "Karpathy's LLM Wiki: The Complete Guide"](https://antigravity.codes/blog/karpathy-llm-wiki-idea-file) | |
| ### Community Analysis | |
| - [Medium (Dreamwalker) — "What Is an LLM Wiki, and Why Are People Paying Attention?"](https://medium.com/@aristojeff/what-is-an-llm-wiki-and-why-are-people-paying-attention-to-it-b7e10617967d) | |
| - [Extended Brain Substack — "The Wiki That Writes Itself"](https://extendedbrain.substack.com/p/postscript-the-wiki-that-writes-itself) | |
| - [Fabian G. Williams — "I Built a Knowledge Base That Writes Itself"](https://www.fabswill.com/blog/building-a-second-brain-that-compounds-karpathy-obsidian-claude) | |
| - [Glen Rhodes — "Karpathy's LLM-powered personal knowledge base workflow"](https://glenrhodes.com/andrej-karpathys-llm-powered-personal-knowledge-base-workflow-using-markdown-wikis-and-obsidian/) | |
| - [Louis Wang — "Building a Self-Improving Personal Knowledge Base Powered by LLM"](https://louiswang524.github.io/blog/llm-knowledge-base/) | |
| - [Codersera — "Karpathy's LLM Knowledge Base: Build an AI Second Brain"](https://ghost.codersera.com/blog/karpathy-llm-knowledge-base-second-brain/) | |
| - [HowAIWorks.ai — "The Karpathy Method"](https://howaiworks.ai/blog/andrej-karpathy-llm-knowledge-bases) | |
| - [a2a-mcp.org — "Building AI-Powered Wikis in Obsidian"](https://a2a-mcp.org/blog/andrej-karpathy-llm-knowledge-bases-obsidian-wiki) | |
| ### Tools and Implementations | |
| - [obsidian-wiki (Ar9av)](https://github.com/Ar9av/obsidian-wiki) — 13-skill framework for Karpathy's pattern | |
| - [llm-knowledge-bases (rvk7895)](https://github.com/rvk7895/llm-knowledge-bases) — Claude Code plugin with multi-tier querying | |
| - [llm-wiki-compiler (ussumant)](https://github.com/ussumant/llm-wiki-compiler) — Focused wiki compilation plugin | |
| - [kb-template (jeremyrayner)](https://github.com/jeremyrayner/kb-template) — Ready-to-fork template | |
| - [karpathy-llm-wiki (Astro-Han)](https://github.com/Astro-Han/karpathy-llm-wiki) — Single-skill package | |
| - [obsidian-skills (kepano)](https://github.com/kepano/obsidian-skills) — Official Obsidian agent skills (13.9k stars) | |
| - [Claudian (YishenTu)](https://github.com/YishenTu/claudian) — Claude Code embedded in Obsidian (4.5k stars) | |
| - [qmd (tobi)](https://github.com/tobi/qmd) — Local hybrid search engine for markdown | |
| - [obsidian-qmd](https://github.com/achekulaev/obsidian-qmd) — qmd Obsidian plugin | |
| - [obsidian-mcp-server (cyanheads)](https://github.com/cyanheads/obsidian-mcp-server) — MCP bridge to Obsidian | |
| - [Obsilo Agent](https://www.obsilo.ai/) — 55+ tool AI agent for Obsidian | |
| ### Related Discussions | |
| - [Obsidian Forum — Claude Code from the Sidebar](https://forum.obsidian.md/t/claude-code-from-the-sidebar/109634) | |
| - [Obsidian Forum — Obsilo Agent announcement](https://forum.obsidian.md/t/ai-agent-that-learns-your-vault-your-rules-your-workflows/111869) | |
| - [Vibecoding.app — Obsidian Skills Review 2026](https://vibecoding.app/blog/obsidian-skills-review) | |
| - [Medium (Addo Zhang) — "Obsidian Skills: Empowering AI Agents"](https://addozhang.medium.com/obsidian-skills-empowering-ai-agents-to-master-obsidian-knowledge-management-8b4f6d844b34) | |
| - [Yuchen Jin on X — Diagram of Karpathy's pattern](https://x.com/Yuchenj_UW/status/2040482771576197377) | |
| ### Hacker News Discussions | |
| - [HN: LLM Wiki — example of an "idea file"](https://news.ycombinator.com/item?id=47640875) | |
| - [Show HN: LLM Wiki — Open-Source Implementation](https://news.ycombinator.com/item?id=47656181) | |
| ### AI Second Brain Ecosystem | |
| - [Forte Labs — AI Second Brain](https://fortelabs.com/blog/introducing-the-ai-second-brain/) | |
| - [second-brain-agent (GitHub)](https://github.com/flepied/second-brain-agent) | |
| - [Second Brain I/O](https://second-brain.io/) | |
| - [AFFiNE — AI Second Brain](https://affine.pro/blog/build-ai-second-brain) | |
| - [Copilot for Obsidian](https://www.obsidiancopilot.com/en) — 100K+ users | |
| - [NxCode — Obsidian AI Second Brain Complete Guide](https://www.nxcode.io/resources/news/obsidian-ai-second-brain-complete-guide-2026) | |
| ### Tutorials and Guides | |
| - [Tutorial: Obsidian Knowledge Base with Claude Code](https://marketingagent.blog/2026/03/28/tutorial-obsidian-knowledge-base-with-claude-code/) | |
| - [How to Build a Local LLM Knowledge Base With Obsidian (2026)](https://www.modemguides.com/blogs/ai-infrastructure/local-llm-knowledge-base-obsidian-setup-guide) | |
| - [How to Connect Obsidian to Claude Code (5 Best Ways)](https://pixelnthings.com/connect-obsidian-to-claude-code/) | |
| - [Obsidian + Claude AI Knowledge Management Setup 2026](https://www.buildmvpfast.com/blog/obsidian-claude-ai-knowledge-management-system-2026) | |
| ### Standards and Protocols | |
| - [AGENTS.md specification](https://agents.md/) — Open standard for agent configuration (Linux Foundation / Agentic AI Foundation) | |
| - [OpenAI Codex AGENTS.md guide](https://developers.openai.com/codex/guides/agents-md) | |
| - [Model Context Protocol (MCP)](https://modelcontextprotocol.io/) — Standard for AI-tool integration | |
| - [Agent Skills — Codex](https://developers.openai.com/codex/skills) — OpenAI's agent skills specification | |
| --- | |
| *This report was compiled on April 6, 2026 — two days after Karpathy's gist publication. The ecosystem is evolving rapidly; implementations and tools listed here may have changed by the time you read this.* |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment