Skip to content

Instantly share code, notes, and snippets.

@karpathy
Created April 4, 2026 16:25
Show Gist options
  • Select an option

  • Save karpathy/442a6bf555914893e9891c11519de94f to your computer and use it in GitHub Desktop.

Select an option

Save karpathy/442a6bf555914893e9891c11519de94f to your computer and use it in GitHub Desktop.
llm-wiki

LLM Wiki

A pattern for building personal knowledge bases using LLMs.

This is an idea file, it is designed to be copy pasted to your own LLM Agent (e.g. OpenAI Codex, Claude Code, OpenCode / Pi, or etc.). Its goal is to communicate the high level idea, but your agent will build out the specifics in collaboration with you.

The core idea

Most people's experience with LLMs and documents looks like RAG: you upload a collection of files, the LLM retrieves relevant chunks at query time, and generates an answer. This works, but the LLM is rediscovering knowledge from scratch on every question. There's no accumulation. Ask a subtle question that requires synthesizing five documents, and the LLM has to find and piece together the relevant fragments every time. Nothing is built up. NotebookLM, ChatGPT file uploads, and most RAG systems work this way.

The idea here is different. Instead of just retrieving from raw documents at query time, the LLM incrementally builds and maintains a persistent wiki — a structured, interlinked collection of markdown files that sits between you and the raw sources. When you add a new source, the LLM doesn't just index it for later retrieval. It reads it, extracts the key information, and integrates it into the existing wiki — updating entity pages, revising topic summaries, noting where new data contradicts old claims, strengthening or challenging the evolving synthesis. The knowledge is compiled once and then kept current, not re-derived on every query.

This is the key difference: the wiki is a persistent, compounding artifact. The cross-references are already there. The contradictions have already been flagged. The synthesis already reflects everything you've read. The wiki keeps getting richer with every source you add and every question you ask.

You never (or rarely) write the wiki yourself — the LLM writes and maintains all of it. You're in charge of sourcing, exploration, and asking the right questions. The LLM does all the grunt work — the summarizing, cross-referencing, filing, and bookkeeping that makes a knowledge base actually useful over time. In practice, I have the LLM agent open on one side and Obsidian open on the other. The LLM makes edits based on our conversation, and I browse the results in real time — following links, checking the graph view, reading the updated pages. Obsidian is the IDE; the LLM is the programmer; the wiki is the codebase.

This can apply to a lot of different contexts. A few examples:

  • Personal: tracking your own goals, health, psychology, self-improvement — filing journal entries, articles, podcast notes, and building up a structured picture of yourself over time.
  • Research: going deep on a topic over weeks or months — reading papers, articles, reports, and incrementally building a comprehensive wiki with an evolving thesis.
  • Reading a book: filing each chapter as you go, building out pages for characters, themes, plot threads, and how they connect. By the end you have a rich companion wiki. Think of fan wikis like Tolkien Gateway — thousands of interlinked pages covering characters, places, events, languages, built by a community of volunteers over years. You could build something like that personally as you read, with the LLM doing all the cross-referencing and maintenance.
  • Business/team: an internal wiki maintained by LLMs, fed by Slack threads, meeting transcripts, project documents, customer calls. Possibly with humans in the loop reviewing updates. The wiki stays current because the LLM does the maintenance that no one on the team wants to do.
  • Competitive analysis, due diligence, trip planning, course notes, hobby deep-dives — anything where you're accumulating knowledge over time and want it organized rather than scattered.

Architecture

There are three layers:

Raw sources — your curated collection of source documents. Articles, papers, images, data files. These are immutable — the LLM reads from them but never modifies them. This is your source of truth.

The wiki — a directory of LLM-generated markdown files. Summaries, entity pages, concept pages, comparisons, an overview, a synthesis. The LLM owns this layer entirely. It creates pages, updates them when new sources arrive, maintains cross-references, and keeps everything consistent. You read it; the LLM writes it.

The schema — a document (e.g. CLAUDE.md for Claude Code or AGENTS.md for Codex) that tells the LLM how the wiki is structured, what the conventions are, and what workflows to follow when ingesting sources, answering questions, or maintaining the wiki. This is the key configuration file — it's what makes the LLM a disciplined wiki maintainer rather than a generic chatbot. You and the LLM co-evolve this over time as you figure out what works for your domain.

Operations

Ingest. You drop a new source into the raw collection and tell the LLM to process it. An example flow: the LLM reads the source, discusses key takeaways with you, writes a summary page in the wiki, updates the index, updates relevant entity and concept pages across the wiki, and appends an entry to the log. A single source might touch 10-15 wiki pages. Personally I prefer to ingest sources one at a time and stay involved — I read the summaries, check the updates, and guide the LLM on what to emphasize. But you could also batch-ingest many sources at once with less supervision. It's up to you to develop the workflow that fits your style and document it in the schema for future sessions.

Query. You ask questions against the wiki. The LLM searches for relevant pages, reads them, and synthesizes an answer with citations. Answers can take different forms depending on the question — a markdown page, a comparison table, a slide deck (Marp), a chart (matplotlib), a canvas. The important insight: good answers can be filed back into the wiki as new pages. A comparison you asked for, an analysis, a connection you discovered — these are valuable and shouldn't disappear into chat history. This way your explorations compound in the knowledge base just like ingested sources do.

Lint. Periodically, ask the LLM to health-check the wiki. Look for: contradictions between pages, stale claims that newer sources have superseded, orphan pages with no inbound links, important concepts mentioned but lacking their own page, missing cross-references, data gaps that could be filled with a web search. The LLM is good at suggesting new questions to investigate and new sources to look for. This keeps the wiki healthy as it grows.

Indexing and logging

Two special files help the LLM (and you) navigate the wiki as it grows. They serve different purposes:

index.md is content-oriented. It's a catalog of everything in the wiki — each page listed with a link, a one-line summary, and optionally metadata like date or source count. Organized by category (entities, concepts, sources, etc.). The LLM updates it on every ingest. When answering a query, the LLM reads the index first to find relevant pages, then drills into them. This works surprisingly well at moderate scale (~100 sources, ~hundreds of pages) and avoids the need for embedding-based RAG infrastructure.

log.md is chronological. It's an append-only record of what happened and when — ingests, queries, lint passes. A useful tip: if each entry starts with a consistent prefix (e.g. ## [2026-04-02] ingest | Article Title), the log becomes parseable with simple unix tools — grep "^## \[" log.md | tail -5 gives you the last 5 entries. The log gives you a timeline of the wiki's evolution and helps the LLM understand what's been done recently.

Optional: CLI tools

At some point you may want to build small tools that help the LLM operate on the wiki more efficiently. A search engine over the wiki pages is the most obvious one — at small scale the index file is enough, but as the wiki grows you want proper search. qmd is a good option: it's a local search engine for markdown files with hybrid BM25/vector search and LLM re-ranking, all on-device. It has both a CLI (so the LLM can shell out to it) and an MCP server (so the LLM can use it as a native tool). You could also build something simpler yourself — the LLM can help you vibe-code a naive search script as the need arises.

Tips and tricks

  • Obsidian Web Clipper is a browser extension that converts web articles to markdown. Very useful for quickly getting sources into your raw collection.
  • Download images locally. In Obsidian Settings → Files and links, set "Attachment folder path" to a fixed directory (e.g. raw/assets/). Then in Settings → Hotkeys, search for "Download" to find "Download attachments for current file" and bind it to a hotkey (e.g. Ctrl+Shift+D). After clipping an article, hit the hotkey and all images get downloaded to local disk. This is optional but useful — it lets the LLM view and reference images directly instead of relying on URLs that may break. Note that LLMs can't natively read markdown with inline images in one pass — the workaround is to have the LLM read the text first, then view some or all of the referenced images separately to gain additional context. It's a bit clunky but works well enough.
  • Obsidian's graph view is the best way to see the shape of your wiki — what's connected to what, which pages are hubs, which are orphans.
  • Marp is a markdown-based slide deck format. Obsidian has a plugin for it. Useful for generating presentations directly from wiki content.
  • Dataview is an Obsidian plugin that runs queries over page frontmatter. If your LLM adds YAML frontmatter to wiki pages (tags, dates, source counts), Dataview can generate dynamic tables and lists.
  • The wiki is just a git repo of markdown files. You get version history, branching, and collaboration for free.

Why this works

The tedious part of maintaining a knowledge base is not the reading or the thinking — it's the bookkeeping. Updating cross-references, keeping summaries current, noting when new data contradicts old claims, maintaining consistency across dozens of pages. Humans abandon wikis because the maintenance burden grows faster than the value. LLMs don't get bored, don't forget to update a cross-reference, and can touch 15 files in one pass. The wiki stays maintained because the cost of maintenance is near zero.

The human's job is to curate sources, direct the analysis, ask good questions, and think about what it all means. The LLM's job is everything else.

The idea is related in spirit to Vannevar Bush's Memex (1945) — a personal, curated knowledge store with associative trails between documents. Bush's vision was closer to this than to what the web became: private, actively curated, with the connections between documents as valuable as the documents themselves. The part he couldn't solve was who does the maintenance. The LLM handles that.

Note

This document is intentionally abstract. It describes the idea, not a specific implementation. The exact directory structure, the schema conventions, the page formats, the tooling — all of that will depend on your domain, your preferences, and your LLM of choice. Everything mentioned above is optional and modular — pick what's useful, ignore what isn't. For example: your sources might be text-only, so you don't need image handling at all. Your wiki might be small enough that the index file is all you need, no search engine required. You might not care about slide decks and just want markdown pages. You might want a completely different set of output formats. The right way to use this is to share it with your LLM agent and work together to instantiate a version that fits your needs. The document's only job is to communicate the pattern. Your LLM can figure out the rest.

@Stephenusegithub
Copy link
Copy Markdown

Stephenusegithub commented May 25, 2026 via email

@kytmanov
Copy link
Copy Markdown

kytmanov commented May 25, 2026

Synto v0.3.0 is out.

https://github.com/kytmanov/synto

Main addition: an MCP server with 8 tools, ready to wire into Claude Code, Cursor, or any MCP client. Your AI assistant can search the wiki, read articles, look up concepts, and run full queries against it.

Also new in v0.3:

  • three-state drafts: draft → verified → published. Review without publishing.
  • queries match by alias too, not just exact titles ("ML" finds "Machine Learning").
  • frontmatter now carries quality signals: source count, single-source flag, source quality.

Same shape as before:

  • works with local LLMs
  • great with Ollama and LM Studio
  • plain Markdown
  • Obsidian-friendly
  • no vector DB
  • no cloud required
  • multi-language

Star it if you want to support local-first AI tools. Fork it if you want to build on it.

@pjdurka
Copy link
Copy Markdown

pjdurka commented May 26, 2026

Great idea, thanks. I took it in a direction of maintainer knowledge for open-source projects, targeting the bus-factor problem: Westner et al. (2025) found bus factors of 1–3 across major neuroscience packages.

Based explicitly upon your pattern, maintainer-wiki-kit.md has been working well for our new neuroinformatics project IDE4EEG and others :-)

@FCall75
Copy link
Copy Markdown

FCall75 commented May 26, 2026

In Obsidian with the Open Copilot Chat plugin, there's no need to use Claude Code; everything is done directly in Obsidian once the "prompt" has been copied there.

@ulyssestenn
Copy link
Copy Markdown

This is a great idea. Worked beyond my expectations. I made a turnkey version where everything lives in a Git repo: Funes. I also made an example library where you can browse the kind of output it creates.

@podviaznikov
Copy link
Copy Markdown

podviaznikov commented May 27, 2026

In case anyone need to export their apple notes or bear notes into obsidian to do llm wiki: I've made Obsidian Plugin and macOS real time exporter app with a quick demo.

@noirblue
Copy link
Copy Markdown

Been running Karpathy's LLM-Wiki pattern for five weeks now.

The compounding effect is real. Drop a paper, get a cross-linked wiki page. Ask a question, get a cited answer that gets filed back into the graph. The maintenance burden is near zero because the LLM does the bookkeeping.

But — when I tried to scale it beyond markdown notes, I hit a wall.

PDFs needed PyMuPDF. Office docs needed python-docx. Video transcripts needed Whisper. Structured LLM outputs needed Instructor. Each tool had its own vault, its own SQLite schema, its own API surface. I built a 450-line Python glue layer to stitch them together.

The glue proved the problem: fragmentation at the architecture layer. Four vaults, four schemas, wikilink collisions, frontmatter drift, MCP namespace explosions. Glue is insufficient because it preserves the fragmentation; it just moves it around.

So I wrote a spec for unification at the architecture layer instead:

Mnemosyne — a local-first semantic memory OS with a Rust core and Python satellites.

  • One vault: raw/, wiki/, memory/, packs/
  • One schema: SQLite state.db with pages, links, jobs, conversations, audit_log, contradictions
  • One API: MCP + REST + CLI from day one
  • Rust core: schema, job queue, graph traversal, API surface
  • Python satellites: PDF extraction, office docs, video/audio transcripts, LLM client glue, structured compilation
  • Two-tier compilation: fast model (30B) extracts concepts, heavy model (70B+) writes cross-linked articles
  • Audit engine: structural lint → contradiction detection → adversarial review
  • Provenance-first: every claim carries citation, every LLM call carries cost log

The Python satellites are permanent, not transitional. Rust cannot yet match Python's depth in document extraction (pymupdf, python-docx, Whisper) or structured LLM output (Instructor, Pydantic-AI). The Rust core spawns them as subprocesses, receives JSON, and integrates into the unified system.

Status: architecture spec complete. Seeking core contributors for Phase 1 implementation (SQLite schema, vault init, markdown ingestion, basic MCP server).

Spec: https://github.com/noirblue/IsaacCLupus_mnemosyn_spec

If you've hit the same fragmentation wall trying to scale the LLM-Wiki pattern, I'd love to talk.

@ethanj
Copy link
Copy Markdown

ethanj commented May 28, 2026

Been running Karpathy's LLM-Wiki pattern for five weeks now.

The compounding effect is real. Drop a paper, get a cross-linked wiki page. Ask a question, get a cited answer that gets filed back into the graph. The maintenance burden is near zero because the LLM does the bookkeeping.

But — when I tried to scale it beyond markdown notes, I hit a wall.

....

If you've hit the same fragmentation wall trying to scale the LLM-Wiki pattern, I'd love to talk.

@noirblue

I'm curious if you have taken a look at my repo - 1.3k stars so far.. its got some of those details covered and I would love to have more good brains on it, in particular people who are interested in how to make the architecture even better.
I just dropped v0.8.0 with some cool new features... Graph/context layer , read-only visualization, Multimodal ingest, etc
https://github.com/atomicstrata/llm-wiki-compiler

@AgriciDaniel
Copy link
Copy Markdown

claude-obsidian: a self-organizing AI wiki for Obsidian + Claude Code

This pattern crystallized something a lot of us were circling. I ended up building a full open-source implementation of it as my daily driver: claude-obsidian (MIT, runs in Claude Code). Same three layers you describe, raw → wiki → schema, with index.md for navigation and an append-only log.md, and the model maintains the wiki while you pick sources and ask questions.

Here are two real vaults it built, the graph is generated, not hand-linked:

Dense generated knowledge graph Vault folder structure and graph

A few questions in this thread are exactly the ones it tries to answer:

  • New page vs. edit an existing one (@alinawab): on each ingest it extracts entities/concepts, searches the existing vault first, and edits + cross-links on a match, only creating a page when nothing fits. Wikilinks and contradiction flags are added automatically.
  • Where it starts fighting you: the one that bit me was concurrent writes (two agents ingesting at once could corrupt a page mid-write); v1.7 added per-file locking to close that. The other is retrieval drift on big vaults, handled with optional hybrid retrieval (contextual prefix + BM25 + rerank, per Anthropic's contextual-retrieval work).
  • Sharing with a team (@geetansharora): it's just plain Markdown in a git repo, no DB or MCP server required, so a team shares it like code, clone the vault and everyone's Claude reads the same files.

Free and MIT. Two-minute walkthrough + repo:

Thanks for the writeup, @karpathy, it's shaped how I think about this.

@ranga291257
Copy link
Copy Markdown

ranga291257 commented May 28, 2026

Thanks for this write-up, @karpathy. It helped me connect and contextualize many years of personal OneNote documentation that had accumulated as scattered knowledge.

I have also started applying this idea to a fictional manufacturing plant use case: a web interface that ingests operating, maintenance, engineering and troubleshooting documents that normally reside in different places and must be manually pieced together when a plant problem occurs.

The aim is to create an evolving knowledge layer that helps users connect information across documents during troubleshooting, while allowing them to select and run local LLM models of their choice.

For industrial applications, this approach is especially valuable because useful knowledge is rarely contained in one document; it emerges from the relationships among operating history, equipment records, procedures, drawings and past decisions.
plant_wiki_controlPanel
you may look in to my linkedin post as well https://www.linkedin.com/feed/update/urn:li:activity:7465711718795771904/

@alirezabbasi
Copy link
Copy Markdown

https://github.com/alirezabbasi/echel/releases/tag/v0.2.0

I just released Echel v0.2.0.

Echel started from the same core insight behind Andrej Karpathy’s “LLM Wiki” idea: AI workflows should not keep rediscovering knowledge from scratch every session.

The LLM Wiki pattern says: instead of relying only on RAG or chat history, let AI continuously maintain a structured, interlinked wiki that compounds over time.

Echel takes that idea into software product creation.

It is a platform where a domain expert defines a problem, intended solution, constraints, risks, MVP, stack, and direction, and Echel continuously turns that intent into:

  • clarified requirements
  • product architecture
  • roadmap
  • executable work
  • graph-backed AI agent packets
  • review reports
  • readiness checks
  • proof packs
  • compounding project intelligence

The key idea is simple:

The product should not lose memory every time the AI context window resets.

In v0.2.0, Echel now includes:

  • product-first initialization
  • root-level product wiki/
  • internal echel-core/ runtime split
  • product-owner commands like define, clarify, plan, build, review, steer, and status
  • typed product intelligence graph
  • agent work packets
  • product cockpit
  • milestone/readiness gates
  • proof packs and release summaries

The direction is to make Echel a product-creation operating system for AI-native development.

A business owner or domain expert should be able to steer the product through intent and decisions, while Echel and AI agents handle the structured execution loop.

Huge inspiration from the LLM Wiki pattern:
https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f

Echel v0.2.0 is the first version where that idea becomes a fuller software-development workflow:

intent -> memory -> graph -> plan -> build packet -> implementation -> review -> readiness -> updated intelligence

@brtrx
Copy link
Copy Markdown

brtrx commented May 28, 2026

Built a lint extension to combat ingestion order bias in wikis with 50+ sources: LLM Wiki: A Bias-Aware Stateful Lint

Two problems emerge at scale — context limits kill single-pass lint, and ingestion order biases the wiki toward earlier sources. The extension runs lint in batches of 5 with a persistent scratchpad across sessions, and uses a randomised (not ingestion-order) source sequence to surface cross-cutting contradictions that ordered passes miss.

Drops into an existing CLAUDE.md. Doesn't fix framing bias baked in at ingest time, but repairs the structural layer reliably.

@noirblue
Copy link
Copy Markdown

A few people have asked: isn't this just a wiki compiler with extra steps?

No. Wiki compilation is one layer of seven. The broader goal is a knowledge lifecycle, not just ingestion:

  • Agent memory: memory/inbox/memory/committed/ — propose, review, link, decay
  • Conversation history: token-budgeted, cross-session, queryable
  • Dual-node deployment: Foundry (batch compile/audit) + Frontline (interactive serve)
  • Audit as infrastructure: every claim traced to source, every LLM call logged with cost/hash
  • Pack exports: agent-ready subsets of the graph for action

The architecture uses a Rust core for the performance-critical path — schema, job queue, graph traversal, API surface — while Python satellites handle document extraction and LLM orchestration where Python's ecosystem has no equal. The split is permanent, not transitional.

The LLM-Wiki pattern proved that compounding works. Mnemosyne asks: what happens when you treat that pattern as infrastructure — with schemas, queues, provenance, and deployment topologies — rather than as content in markdown files?

Spec: https://github.com/noirblue/IsaacCLupus_mnemosyn_spec

@iddingszhz
Copy link
Copy Markdown

我按照 Karpathy 的思路开发了一个日记库,充分运用 Karpathy 的 "固定协议 + 可插拔模块" 架构。

用在自我认知上,比用在代码上更让人震撼。

CLAUDE.md 作为行为协议来约束 AI,而不是堆提示词。把 AI 从一个"聊天对象"变成了一只严格执行 SOP 的 analytic engine。

同时采用 协议和人格分离 的设计:协议保证行为可预期、可迁移(换 AI 客户端也不崩),人格随时换文件就能换风格——这是系统工程思维在个人管理上的降维打击。

日记架构按照 L0 → L4 的分层深化

  • 大多数日记工具停留在 L0(记录)
  • 好一点的做到 L1(周回顾)

而我把跨时间复利对比(L3 / L4)直接做成内建功能。当日记从"散装存档"变成"可检索的认知档案",它就不再是日记了——是一个人的增量自我进化系统

👉 https://github.com/iddingszhz/Life_Daliy_OS

@paulmchen
Copy link
Copy Markdown

Synthadoc v0.6.0 is out.

Two new features focused on content health and portability:

  1. Five-State Page Lifecycle: pages now move through five states: draft → active → stale → contradicted → archived. Transitions are mostly automatic: a clean lint pass promotes draft to active; a SHA-256 hash mismatch on the source file marks the page stale; ingesting a conflicting source marks it contradicted. Every transition is permanently logged with who triggered it, when, and why, so you have an auditable trail of when content was reviewed, not just who last touched the file. Manual overrides (lifecycle activate, archive, restore) are there when automation isn't enough.

→ Quick-start demo - Step 8 (https://github.com/axoviq-ai/synthadoc/blob/main/docs/user-quick-start-guide.md#step-8--manage-page-lifecycle)

  1. Wiki Export: four formats in one command: synthadoc export -f . llms.txt and llms-full.txt follow the llmstxt.org spec for feeding the wiki into downstream LLM workflows. "graphml" exports the wikilink graph with node attributes (status, confidence, citation count) for analysis in Gephi or yEd. "json" export includes three fields no generic wiki export produces: every paragraph traces back to exact source lines (claims[] -file name plus line range, not just "from document X"); the complete state-transition audit log (lifecycle_history[] - when the page was drafted, reviewed, went stale, re-ingested); and the per-page API cost to compile it (ingest_cost_usd).

→ Quick-start demo - Step 21 (https://github.com/axoviq-ai/synthadoc/blob/main/docs/user-quick-start-guide.md#step-21--export-your-wiki)

  1. Also shipping in v0.6.0: refined Obsidian plugin UX across lifecycle and export modals.

If any of this is useful or you have thoughts on the direction, feedback is very welcome, and a ⭐ always helps the project reach more people.

👉 https://github.com/axoviq-ai/synthadoc

@Yarmoluk
Copy link
Copy Markdown

Yarmoluk commented May 30, 2026 via email

@theafh
Copy link
Copy Markdown

theafh commented May 30, 2026

This wiki pattern worked so well for knowledge that I tried the same
plain-markdown-over-LLM idea on the other half of the AI-developer workflow!
If the /wiki is a Confluence you can talk to, /task is the JIRA/Trello version:
a backlog that lives next to the code as files, not in a SaaS.

Same principles, different artifact:

  • One self-contained markdown file per task, written so a single-shot agent
    can pick it up and implement it with no chat context.
  • Filesystem-native and git-tracked: tasks/ for open work, tasks/archive/
    for done/dropped, scope in the filename. No lock-in, diffs, and history for free.
  • A bundled linter keeps it healthy — naming, frontmatter, status↔location,
    links, dates (the task analogue of the wiki's lint pass).
  • A small lifecycle of skills instead of one mega-prompt:
    create → check → implement → audit → finish, plus a whole-tree fix/health pass.

Open-source, MIT License, Multi-agent skill deployment:
https://github.com/theafh/ai-modules/tree/main/plugins/ai_dev/skills

@hegu-1
Copy link
Copy Markdown

hegu-1 commented May 31, 2026

Been circling the same three-layer shape (raw → wiki → outputs, markdown, pre-compiled, wiki-linked) — but I came at it from a different angle: not "knowledge base as a product," more "a container for my own continuity" that any model loads to stay in phase with me across sessions and machines.

The one place I deliberately diverge from your setup: I don't let it drift toward "none written by me directly." The model writes the derivative / linking / consistency layer; the framing and the decisions stay authored by me — otherwise the continuity quietly stops being mine.

Which makes me wonder about something bigger than a second brain: if everyone runs a personal container like this and the model becomes the interchangeable part, the interesting unit isn't one general model anymore — it's (person × container × commodity intelligence). Feels less like "AGI living in a model" and more like generality relocating into millions of personal substrates. Curious whether you see it pointing that way too.

Open-sourced the bare structure if it's useful to anyone: https://github.com/hegu-1/personal-memory-vault-starter

@MarcoPorcellato
Copy link
Copy Markdown

Architecture Discussion: Overcoming the "Flat Markdown Mutation Problem" in LLM-OS environments (Flat Files vs Outliner AST)

LOGSEQ (block granularity) vs Obsidian (page granularity) and my Matryca Plumber

Hi Andrej / everyone,

I’ve been heavily inspired by your concepts around the LLM OS and using flat Markdown as the filesystem for AI agents. I’ve been building autonomous loops using this exact paradigm, but I hit a massive architectural wall when using standard editors (like Obsidian) as the frontend.

The Problem: The "Flat File Mutation" Danger
When an AI agent tries to autonomously inject properties, update status tags, or surgically edit a document in the background while the user is typing, flat Markdown fails.

Standard LLMs struggle to parse the exact line numbers or preserve complex metadata in flat .md files without corrupting the surrounding text or triggering race conditions (causing data loss).

The Workaround/Experiment
I realized that for an LLM to truly act as a safe background daemon, the filesystem needs an inherently rigid structure. I pivoted the experiment to LOGSEQ (which stores data as a strict Block-based Outliner AST, with UUIDs per block) and implemented an Optimistic Concurrency Control (OCC) locking mechanism via mmap.

By doing this, the LLM doesn't have to guess where to inject data. It targets the block UUID.

  1. The human types on the frontend.
  2. The AI reads the AST, computes semantic tags, and injects properties in the background.
  3. Zero file corruption.

I built an open-source Python daemon to prove this concept (matryca-plumber), fully protecting the AI-to-File boundary with strict token limits and JSON recovery. It has also an MCP and CLI to dialogue with notes and write directly to .MD files.

https://github.com/MarcoPorcellato/matryca-plumber

I wanted to share this architectural finding here because as we build out the "LLM Wiki" or "LLM OS" concepts, shifting the storage paradigm from Flat Text AST to Block Outliner AST seems to be the only way to grant LLMs safe, concurrent write-access to human notes.

Would love to hear your thoughts on concurrency and background mutations in your Obsidian setups!

@NicoBleh
Copy link
Copy Markdown

Took this pattern in a security direction and wanted to share the concept in case it's useful to others here.

The thing that struck me: an autonomously-ingesting wiki is itself an indirect-prompt-injection surface. A crafted source can plant instructions that persist into the wiki and poison later sessions. The lint step in the original idea checks correctness and contradictions – but not adversariality. Those are orthogonal axes.

So I built a version hardened around one invariant: untrusted input (a source) must never reach a channel later treated as trusted (the wiki). Concretely:

  • source text nonce-delimited and declared untrusted at every model boundary (extraction, review, read-time)
  • an independent second model reviews each write for manipulation, not correctness (four-eyes)
  • trust-tiering by host; weakest level propagates
  • git-backed provenance per claim, so a poisoned entry is findable and revertible
  • an injection corpus mapping each attack to the gate it must fail at, plus OWASP LLM Top 10 / MITRE ATLAS

It's defense-in-depth, not a guarantee – the sanitizer is heuristic and the reviewer is still an LLM. Code and the threat-model write-up:
https://github.com/NicoBleh/secure-llm-wiki

@vforvalerio87
Copy link
Copy Markdown

I started doing exactly this a few months ago. Started with my company's "sales and marketing", then my personal stuff, now every code repository. raw-data directory, steering files to process input into the knowledge base, all files reference each other to build a tree, linting rules, a concept of "knowledge / archive" vs. "current state" which can be either a brief of the most current action items, or the most recent reads on a bunch of things, or what I call "initiatives" which is: something with a start date, an end date, a goal to achieve and a way to measure it. Then there's the linting rules and a "state" file which is every single piece of raw-data ever processed, what documents it was distilled to, what was the original file name and a brief description. The state is committed, the raw-data isn't. Then since it's domain-specific knowledge-bases each has a few concepts and ontologies of stuff it needs to track in the documents and how they must be processed. So my workflow is usually dump a bunch of stuff in raw-data, tell the kb to eat it and it's done. It's got a rule so that sessions can't be load bearing, context is always ephemeral, everything must be crystallized unless it's really unimportant or noise.

@GhostBoyBoy
Copy link
Copy Markdown

AI-Collaborative Project Workspace: An Open Proposal for the Next Generation of Software Engineering Directory Structures
https://gist.github.com/GhostBoyBoy/88b02a1ace885cffd5b576ce621eb13a

@Shilren
Copy link
Copy Markdown

Shilren commented Jun 1, 2026

Thanks for this writeup — the core reframe, "don't re-retrieve raw docs at query time, let the LLM incrementally maintain a persistent wiki," genuinely changed how I approached a problem I'd been stuck on. Sharing what I built on top of it.

The problem. Like a lot of people, my work experience was scattered across project retros, weekly reports, chat logs, and my own head. Every time I wrote a resume or prepped an interview answer, I re-dug through everything and re-fed it to an LLM — expensive, slow, lossy, and worst of all the numbers never matched across versions (a metric would be "22" in the resume and "23" in the spoken script, because each was written from scratch). I realized most people treat AI as a one-shot rewriter, when the real leverage is a reusable, single-source-of-truth knowledge layer — which is exactly your LLM Wiki point.

What I builtinterview-doc-agent (https://github.com/Shilren/interview-doc-agent), an open-source skill (MIT) that maps onto your three layers almost one-to-one:

  • materials/raw sources. Paste anything, unstructured.
  • experience-library/the wiki layer. The LLM consolidates each project into a clean, interview-ready doc following a fixed schema: one-liner → situation/problem/actions → every quantified metric preserved → transferable strengths.
  • index.mdthe index. The agent reads this first to locate the 1–2 relevant docs, instead of loading the whole library.
  • SKILL.mdthe schema. Behavior config: when to trigger, what steps to follow, hard rules (e.g. never fabricate a number — leave a ___ placeholder if missing).

The workflow is: drop raw notes in → "consolidate into the library" (ingest once) → then every resume, interview script, or JD-tailored variant generates from the same library, so the numbers are always consistent. New experiences compound the library instead of being re-derived each time.

The part I most wanted to add — context vs. RAG is fundamentally a question of magnitude. A lot of "knowledge base + AI" projects reach for RAG by default, but I think that's often premature. The decision tree I landed on:

  • < ~50k–100k tokens (~150–200 dense pages): context (LLM Wiki) wins, decisively. 100% retrieval reliability (no missed chunks, no semantics broken by chunking), near-zero infra (well-structured Markdown — no vector DB, no embedding pipeline, no chunking/retrieval tuning, hours instead of weeks), and global reasoning over the whole corpus rather than stitching isolated snippets — which matters a lot for multi-hop or "summarize across everything" tasks.
  • Millions of tokens and up: RAG is the only option — it won't fit, so retrieval is how you scale.
  • In between / production: hybrid — the stable, core knowledge in context, the dynamic/massive/per-user records behind RAG.

A personal experience base, once consolidated, is only a few thousand to ~20k tokens — far below the line — so RAG would be pure overhead and a reliability downgrade. And to be precise: my index.md isn't RAG. It does no vector matching and no chunking; it just lets the agent open fewer whole files as the library grows. Even reading the entire library would comfortably fit in context — the index is an optimization, not a retrieval system. With modern context windows at 200k–1M+ tokens, the ceiling for the pure-context approach keeps rising.

Net: below that magnitude threshold, the LLM Wiki approach is both simpler and more reliable than RAG. Thanks again — this writeup saved me from over-engineering the whole thing.

@Shilren
Copy link
Copy Markdown

Shilren commented Jun 1, 2026

Karpathy说的「别每次都去翻原始资料,而是让 LLM 一点点维护一个长期的 wiki」这句话,真的点醒了我一个一直没想通的问题。

说说我照着这个思路做的东西。我遇到的问题:我的工作经历到处都是 —— 项目复盘、周报、聊天记录、还有一些只在脑子里。每次写简历或者准备面试,都得重新翻一遍、再喂给 AI,又费钱又慢还容易漏东西。最烦的是,同一个数字在简历里写"22"、在面试稿里却记成"23",因为两份是分开写的。我后来想明白:大多数人只是把 AI 当成"帮我改一次"的工具,但其实更值钱的是有一个能反复用、数据只有一个出处的知识库 —— 这就是你说的 LLM Wiki。

我做的东西 —— interview-doc-agent (https://github.com/Shilren/interview-doc-agent),一个开源,%E4%B8%80%E4%B8%AA%E5%BC%80%E6%BA%90) skill(MIT),几乎一对一对应你的三层:

materials/ → 原始素材,什么格式都行,粘进来就好
经历库/ → wiki 层,AI 把每个项目整理成一份面试能直接用的档案(一句话 → 背景/问题/做法 → 所有数据都留住 → 可迁移的亮点)
index.md → 索引,先读它找到相关的 1-2 篇,而不是全部加载
SKILL.md → schema,行为说明(什么时候用、按什么步骤、有哪些硬规则,比如"绝不编数字")
我最想补充的一点 —— 到底用上下文还是 RAG,本质上是个量级问题。 很多"知识库 + AI"的项目张口就上 RAG,但我觉得这往往是过度设计了。我自己理出来的判断标准是:

小于约 5万–10万 token(大概 150–200 页):上下文(LLM Wiki)完胜 —— 检索可靠性 100%(不会漏匹配、也不会因为分块把语义切断)、几乎不需要任何基建(一个结构清晰的 Markdown 就够,不用向量库、不用嵌入、不用分块)、而且能对全局直接推理,而不是把几个零散片段拼起来
几百万 token 以上:只能用 RAG —— 塞不进上下文,检索是唯一能扩展的办法
介于两者之间 / 生产系统:混合 —— 把最核心稳定的知识放进上下文,把海量、动态的数据交给 RAG
一份个人经历库整理完也就几千到两万 token,远在这条线以下,所以上 RAG 纯属增加负担、还把可靠性变差了。再澄清一点:我那个 index.md 也不是 RAG —— 它不做向量匹配、不分块,只是在经历变多以后让 AI 少读几个完整文件而已;就算全读进去也放得下,索引只是个优化,不是检索系统。现在主流模型上下文都到 200k 甚至 100 万+ token 了,纯上下文这条路的上限只会越来越高。

只要在这个量级以下,LLM Wiki 比 RAG 又简单又可靠。

@skyllwt
Copy link
Copy Markdown

skyllwt commented Jun 2, 2026

Love this LLM-Wiki idea? We built a full open-source system on it → AutoSci

Come and enjoy: https://github.com/skyllwt/AutoSci

autosci_fig1_v1-第 15 页 drawio png

Karpathy's pattern (immutable raw/ → an LLM-compiled wiki/ → a CLAUDE.md schema) is the exact foundation of AutoSci — an agent that turns the wiki into a research memory and then does autonomous science on top of it. All on Claude Code:

  • 📚 Ingest papers into a cross-linked wiki of concept/method/idea pages ([[wikilinks]], contradiction edges — the LLM-Wiki, fully realized)
  • 💡 Ideate → experiment → write: it reads its own memory to generate ideas, design + run experiments, draft the paper, and handle rebuttals
  • 🧬 Self-evolving memory: between projects it consolidates, re-weights, and re-links the wiki (a "sleep" phase)
  • 🕸️ Multi-agent DAGs for the hard reasoning steps

We've already used it to write 3 papers end-to-end. Fully open (MIT).

⭐ If this is where you want LLM-Wikis to go, a star genuinely helps us! and we welcome issues/PRs/Contributors
👉 https://github.com/skyllwt/AutoSci
📄 Paper: https://arxiv.org/abs/2605.31468

Demo: 【北大做了一个会自我进化的科研 Agent:AutoSci】 https://www.bilibili.com/video/BV19gVg6pEk6/?share_source=copy_web&vd_source=338de971cb27f42aaaf5d8bfdeed04b3

截图 2026-06-02 09-03-07

RED: 北大做了一个“越做科研越聪明”的AI科学家 北大团队... http://xhslink.com/o/2clEkgugEPw
复制后打开【小红书】查看笔记!

@Z-M-Huang
Copy link
Copy Markdown

Dense-Mem: memory beyond RAG

dense-mem-memory-beyond-rag

I built an open-source MCP memory server called Dense-Mem:

https://github.com/markhuangai/dense-mem

I built an open-source MCP memory server called Dense-Mem:

https://github.com/markhuangai/dense-mem

The idea I am trying to explore is that RAG is very useful for retrieval, but durable AI memory may need more than vector search alone.

In daily LLM workflows, the hard problems are often not just "find a similar chunk." They are questions like:

  • What did the user actually say?
  • Is this evidence, a proposed claim, or an accepted fact?
  • Is there a newer fact that supersedes the old one?
  • Are two memories in conflict?
  • What source supports this answer?
  • Can the same memory be reused across different AI tools?

So Dense-Mem separates the host LLM from the memory layer:

  • the host LLM handles conversation and judgment
  • Dense-Mem stores evidence, typed claims, accepted facts, provenance, conflicts, embeddings, and graph recall
  • MCP clients can recall the same durable memory instead of rebuilding context from scratch every session

I wrote more about the motivation here:

https://markhuang.ai/blog/ai-memory-beyond-rag

And I made a hosted demo so people can try it without self-hosting:

https://markhuang.ai/blog/dense-mem-hosted-demo-test-instance

I do not want to overclaim that this "solves memory." I am mostly curious whether this architecture feels useful, overbuilt, or missing something important.

If anyone has thoughts, I would especially appreciate critique on:

  1. Does the memory-server / host-LLM boundary make sense?
  2. Is graph-backed memory a useful abstraction here, or should this remain simpler?
  3. What evaluation would best show whether this improves accuracy in real daily LLM workflows?
  4. What failure modes should I test before claiming this direction is useful?

Thanks for reading. I would genuinely value technical pushback.

@mikhashev
Copy link
Copy Markdown

Follow-up on our knowledge-as-weights post. Last time we asked: what if a personal model could learn and grow with you -- encoding confirmed knowledge directly in weights, adding new facts as you learn them, no RAG needed? Two weeks and 52 experiments later, we have a partial answer: below ~10M parameters, we could not make knowledge-in-weights work via any mechanism we tested -- LoRA injection, test-time training, or architecture changes. Here's the journey.

The vision: We're building DPC Messenger -- peer-to-peer infrastructure for human-AI co-evolution. A space where people and AI grow together. The target (knowledge-in-weights) architecture (designed, not yet implemented) uses continual LoRA adapters per knowledge cluster, with a perceptron router for O(1) adapter selection at inference. Three knowledge types are planned: static facts, dynamic state, and predictive hypotheses -- each with their own lifecycle in weights. The model would learn and grow as the person confirms new knowledge -- not one-shot injection, but continual learning. What's built today: the knowledge graph, the training pipeline, and the diagnostic tooling. What's not built yet: the adapter routing, the knowledge-type lifecycle, and the continual learning loop itself. But first we needed to know: how small can the knowledge model be? This post maps the lower bound.

Phase 1: "Can a small model learn our domain?" (11 experiments)
We adapted the autoresearch framework (based on karpathy/nanoGPT) with a custom training pipeline and experiment workflow for domain-specific knowledge encoding. All 52 experiments are logged with full reproducibility metadata (hyperparameters, timings, per-epoch metrics). We trained on a personal knowledge corpus -- 278K documents (~7.5M tokens) of domain-specific triples from a knowledge graph plus natural text. Question: can a tiny model reach reasonable perplexity on structured knowledge?

TinyStories baseline: val_bpb 1.221. Our corpus: 1.682. The domain gap is real -- structured triples are harder than simple narratives. We tried dropout (no effect), smaller models (worse at same compute budget), larger models (overfit). The floor on our corpus is ~0.98 val_bpb, regardless of architecture (confirmed across two independent architectures: 4L/256d at 3.7M params and 8L/128d at 2.0M params). Width matters more than depth at fixed compute -- 4L x 256d beats 8L x 128d consistently. Seed variance dominated architectural differences in our measurements. The model learns the domain, but the data ceiling is immutable at this corpus size.

Phase 2: "Does architecture matter?" (21 experiments)
Systematic sweep of 5 architectural knobs: MLP ratio, GQA, value embeddings, RMSNorm, weight tying. Answer: mostly noise at this scale. One exception -- weight tying saves 50% of parameters with zero quality loss, giving us our final 3.7M param model. BERT-style encoder tested at matched scale (~30M params) -- decoder beats encoder by +1.24 bpb. Value embeddings are load-bearing (removing them costs +0.116 bpb at matched parameter count). The take-away: at sub-10M scale, architecture is second-order. The first-order constraints are data and compute.

Phase 3: "Can we inject knowledge via gradients?" (600 combinations)
Test-time training (TTT) sweep: 600 combinations of learning rate, steps, and fact count on a 37M param model. Result: 0/600 newly recalled facts. The model overfits on injection token patterns while loss drops to 0.01. It memorizes the surface form of injection text but cannot extract semantic subject-predicate-object associations. Via teacher forcing, entity accuracy is 19.4% -- facts are weakly present in weights, but free recall fails completely.

Phase 4: "Can LoRA inject knowledge?" (4 experiments + diagnostics)
This is where we expected success. Schulman et al. (2025) show LoRA injection works reliably when per-token probability exceeds p=0.5 on 7B+ models. We tried it at 3.7M.

Config LoRA target Rank Params injected LR top-1 recall top-10 collapse
MLP-only c_fc, c_proj 16 164K (4.5% of model) 1e-4 0.49% -> 0% by epoch 3 epoch 6
MLP-only c_fc, c_proj 16 164K 1e-3 0.49% -> 0% by epoch 1 epoch 2
Attn-only c_q, c_v 16 65K (1.8% of model) 1e-4 0.49% -> 0% by epoch 4 by epoch 9

Loss decreased monotonically in every run while recall died. The model learned to minimize loss on injection text via existing non-factual patterns -- not by encoding new facts.

The diagnostic that explained everything:
We built a 5-metric diagnostic pipeline (D1-D5) and measured per-token probability p(fact) -- the model's confidence on correct factual completions. Result: zero tokens above p=0.5 even BEFORE LoRA training. Out of 1,214 injected facts: max p(fact) = 0.29, mean = 0.008. Post-LoRA training: max dropped to 0.02, mean to 0.002.

Schulman et al. (2025) show LoRA injection works reliably above the p=0.5 threshold. Our model's ceiling is ~60x below that. This rules out "LoRA optimization problem" and confirms "structural capacity problem" -- the model cannot encode facts at sufficient probability for stable recall. No amount of adapter tuning fixes this.

Additional diagnostic: base model gradients are 600-5000x larger than LoRA gradients (ratio grows as training progresses). The adapter barely influences the model despite measurable loss reduction.

Why existing literature didn't predict this:
Three recent LoRA knowledge-injection papers (Schulman 2025, "Understanding LoRA as Knowledge Memory" ICML 2026, "LoRA Rank Trade-offs" Dec 2025) all test on 7B+ parameters -- three orders of magnitude larger than our model. At 4 layers, MLP-only LoRA perturbs 50% of the model; attention-only perturbs 18%. At 24+ layers those numbers drop to ~4% and ~1.5% respectively. LoRA assumes deep enough models that adapters are small perturbations -- that assumption breaks below ~10M parameters.

What's next:
Scaling up to 10-50M parameters (12-16 layers) where LoRA perturbation drops below 5%. Running the same D1-D5 diagnostic pipeline on each checkpoint to find where p(fact) crosses the 0.5 threshold. If it does, we have the minimum viable model size for personal LoRA knowledge injection. If it doesn't, the knowledge-as-weights hypothesis needs a fundamentally different injection mechanism at this scale.

Hardware: All experiments on a single RTX 3060 12GB (Xeon E3-1240 V2). Full scaling-laws grid (~9 configs) runs in ~1.5 hours. LoRA experiments + diagnostics: ~2 hours. This is entirely reproducible on consumer hardware.

@MuhammadSaqlainAslam
Copy link
Copy Markdown

MuhammadSaqlainAslam commented Jun 3, 2026

Hi Andrej — your gist inspired this project at the AI Research Center, Hon Hai Research Institute (Foxconn).
🌐 Live demo: https://muhammadsaqlainaslam.github.io/my-llm-wiki
📦 Repo: https://github.com/MuhammadSaqlainAslam/my-llm-wiki
🏛️ Institute: https://hhri.foxconn.com/en
What we built on top of your pattern:
🤖 Agentic ingestion — one command searches arXiv, GitHub, blogs, and YouTube transcripts, generates structured notes, and deploys automatically:
python3 agent.py topic "Mamba 2 SSM improvements"
📚 Citation intelligence — every paper shows its total citations plus the top 10 most-cited papers that reference it, tracing the full intellectual thread from the 2017 Transformer through 2025 inference optimization.
🌐 Interactive web demo — browse 150+ notes, full-text search with Ctrl+F-style match navigation, side-by-side paper comparison, D3 knowledge graph where node size = citation count, and a timeline from 2017→2026. Fully mobile responsive.
🔗 Deep cross-linking — every note shows backlinks. "Attention is All You Need" has 46 backlinks — the most referenced paper in the wiki, as expected.
Coverage: Transformer → FlashAttention → S4 → Mamba → Mamba-2 → Mamba-3 → xLSTM → RWKV → RetNet → Griffin → Speculative Decoding → KV Cache → DeepSeek-V4 · 50+ concept glossary stubs
Stack: Claude API (Vertex AI) · PyMuPDF · D3.js · KaTeX · GitHub Pages · Obsidian + Dataview

@kytmanov
Copy link
Copy Markdown

kytmanov commented Jun 3, 2026

Synto v0.5.0 is out.

https://github.com/kytmanov/synto

  • Main addition: per-role providers. Run each model where it makes sense - small fast model on local Ollama, heavy writing model on a cloud endpoint. Each role gets its own provider, connection, and key. synto setup walks you through the split.

Also new in v0.5:

  • rename a concept everywhere: synto concept rename OLD NEW moves the article, repoints every inbound link, and migrates the state DB.
  • per-model knobs: context size, temperature, thinking on/off. Thinking models (Qwen 3.5, DeepSeek-R1) no longer time out on ingest.
  • Anthropic-compatible API support, on top of OpenAI-compatible - point any role at Kimi and similar endpoints.

Same shape as before:

  • works with local LLMs
  • great with Ollama and LM Studio
  • plain Markdown
  • Obsidian-friendly
  • no vector DB
  • no cloud required
  • multi-language

Star it if you want to support local-first AI tools. Fork it if you want to build on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment