Skip to content

Instantly share code, notes, and snippets.

@brtrx
Last active May 28, 2026 13:29
Show Gist options
  • Select an option

  • Save brtrx/ba595fd01e344d797cb66d34d982ecad to your computer and use it in GitHub Desktop.

Select an option

Save brtrx/ba595fd01e344d797cb66d34d982ecad to your computer and use it in GitHub Desktop.
LLM Wiki: A Bias-Aware Stateful Lint

LLM Wiki Lint — Stateful, Batched, Randomised

An extension of Andrej Karpathy’s LLM Wiki pattern that makes lint work at scale — and addresses a deeper problem the base pattern doesn’t fully solve.


The Root Problem: Ingestion Bias

In the base LLM Wiki pattern, sources are ingested one at a time. Each new source creates or updates wiki pages. This works well, but it introduces a structural bias: early sources disproportionately shape the wiki.

The first source to introduce a concept creates its page — setting the terminology, the framing, the scope. Later sources get reconciled into that frame rather than challenging it. The model tends to update and extend existing pages rather than restructure them. Over time, the wiki’s ontology reflects the order things arrived, not the relative importance or representativeness of sources.

This is an epistemic problem, not a structural one. It doesn’t show up as a broken link or a missing page. It shows up as concept pages that have a particular slant, blind spots in how topics are connected, and a wiki that quietly overfits to whichever sources came first.


What Lint Can and Can’t Fix

Karpathy’s pattern includes a lint operation — a periodic health check for orphaned pages, broken links, contradictions, and stale claims. Lint addresses the structural layer of the bias problem: it can surface contradictions between pages, flag missing concepts, and catch claims that newer sources have superseded.

What lint can’t do is detect framing bias directly. It won’t tell you that concepts/attention-mechanism.md is written from the perspective of source #3 rather than the field at large. That requires something different — periodic full reprocessing, or explicit prompts asking Claude to steelman alternative framings of key pages.

Randomising the lint order is a partial bridge between these two layers. By ensuring lint doesn’t always encounter sources in the same sequence, it’s more likely to surface contradictions between early and late sources that order-biased passes would miss. It won’t fix framing, but it makes the structural repair more honest.


The Practical Problem: Scale

Even setting aside bias, "lint the wiki" breaks down on a mature wiki for a second reason: context limits. On a wiki built from 50+ sources, a single lint pass exhausts the context window mid-run. The model stops being thorough without telling you what it missed.


The Solution

Three additions to the base pattern:

1. Batched sessions. Lint runs 5 sources per session, protecting the context window and making each pass reliable and completable.

2. Randomised order. The lint order is shuffled once by the LLM — not alphabetically, not by ingestion order — and stored in lint-status.md. The order is fixed across sessions so progress is resumable and the randomisation is stable. This counteracts topic clustering and makes cross-cutting connections more likely to surface.

3. Persistent scratchpad. Findings accumulate in lint-scratch.md across sessions. Issues are annotated FIXED rather than deleted, creating a changelog. The model reads the scratchpad at the start of each session, so it’s aware of patterns found in previous batches — including issues that recur across multiple sources.

The interface stays simple: "lint the wiki" or "continue lint". All complexity lives in CLAUDE.md and the two meta files.


What This Doesn’t Solve

Lint — even well-designed lint — cannot fully repair framing bias baked in at ingest time. If your wiki was built from 100+ sources without linting, the structural layer can be repaired with this approach. The conceptual framing layer requires additional work:

  • Periodic full reprocessing — delete wiki/ and re-ingest all of raw/ from scratch, ideally in a different order
  • Explicit challenge prompts — ask Claude to argue against the current framing of high-traffic concept pages
  • Targeted rewrite sessions — for pages you suspect are over-indexed on early sources, ask Claude to rewrite them using only late-ingested sources, then reconcile

Files

wiki/
  lint-status.md     ← randomised source order + per-source status
  lint-scratch.md    ← running findings log across all batches
CLAUDE.md            ← add the lint workflow block from Bias-aware-stateful-lint-workflow.md

Setup

  1. Add the lint workflow block from CLAUDE-lint-section.md to your existing CLAUDE.md.
  2. Ask Claude to create wiki/lint-status.md:

“Create lint-status.md. List all sources from the index in a randomised order — not alphabetical, not ingestion order. Label this seed 42. Mark all sources [ ].”

  1. Create an empty wiki/lint-scratch.md using the template.
  2. Say "lint the wiki" to begin.

Status Markers

Marker Meaning
[ ] Unchecked
[~] Partial — hub page with many inbound links, needs revisiting once all contributing sources are linted
[x] Done

Hub pages are marked [~] when first touched and only promoted to [x] once all sources that link to them have been linted. This prevents falsely marking a heavily-linked concept page as clean after a single pass.


Convergence Signal

The scratchpad gives you a natural indicator of wiki health. As the wiki matures, the rate of new findings per batch should drop. When a full pass surfaces only format issues and no contradictions or missing pages, the structural layer has stabilised. Framing bias won’t show up in this signal — that requires the separate interventions above.


Credit

Built on Andrej Karpathy’s LLM Wiki gist. This is an extension, not a replacement — the base pattern, folder structure, and ingest/query workflows are his.

# CLAUDE.md — Lint Workflow Block
Drop this section into your existing `CLAUDE.md`. It replaces or extends any existing lint instructions.
-----
## Workflow 3: Lint
**Trigger:** Human says “lint the wiki”, “continue lint”, “health-check”, or “what’s missing”.
### Process
Lint runs in **batches of 5 sources per session** to protect context. The order is randomised (not alphabetical or ingestion-order) to counteract topic clustering and reveal cross-cutting connections that order-biased passes miss. The randomised order is stored in `wiki/lint-status.md` (seed 42) and must not be re-shuffled between sessions.
### Steps
1. Read `wiki/lint-status.md` to find the next 5 unchecked sources (`[ ]`).
1. Read `wiki/lint-scratch.md` to load findings from previous batches and check for unresolved patterns.
1. For each source: read the source page, then read every entity and concept page it links to.
1. Append findings for each source to `wiki/lint-scratch.md` under a new batch header.
1. After all 5 sources, re-read the scratch entries for this batch and note any **cross-cutting patterns** (e.g. the same broken link appearing in multiple sources = higher priority fix).
1. Execute all fixes for the batch.
1. Mark all 5 sources `[x]` in `wiki/lint-status.md`. Hub concept/entity pages touched get `[~]` if not all contributing sources have been linted yet; promote to `[x]` only when all contributing sources are done.
1. Append one summary entry to `wiki/log.md`.
### Finding categories (use these headings in lint-scratch.md)
- **Broken wikilinks** — link target doesn’t match any slug in the index
- **Missing pages** — entity/concept linked inline but no page exists
- **Stale sources lists** — concept/entity page’s `sources:` frontmatter missing a source that links to it
- **Stubs** — pages with thin content that warrant expansion
- **Contradictions** — claims that conflict across pages
- **Stale claims** — things newer sources may have superseded
- **Format issues** — frontmatter missing required fields, wrong link syntax, etc.
---
title: Lint Scratch Pad
type: meta
updated: YYYY-MM-DD
---
# Lint Scratch Pad
Running findings file. Batch headers mark session boundaries.
Findings are annotated `FIXED` once edits are applied — not deleted — so the file
doubles as a changelog and lets subsequent passes detect recurring patterns.
---
<!-- Claude appends new batches below. Do not manually reorder entries. -->
## Batch 1 (sources #1–5)
### 1. source-slug
- **Missing pages**: `[[concept-name]]` linked but no page exists → FIXED (created `concepts/concept-name.md`)
- **Broken wikilinks**: `[[old-slug]]` → should be `[[new-slug]]` → FIXED
### 2. source-slug
- *No findings.*
<!-- Cross-cutting patterns noted after batch: -->
> **Pattern**: `[[concept-name]]` appears broken in sources #1 and #3 — higher priority, likely affects more pages downstream.
---
title: Lint Status Tracker
type: meta
updated: YYYY-MM-DD (batch 1)
---
# Lint Status Tracker
Order is randomised (seed 42) to counteract ingestion-order bias in concept coverage.
Status: `[ ]` = unchecked · `[~]` = partial (hub page, revisit) · `[x]` = done
Each checked source also lists the entity/concept pages touched in that pass.
-----
## Sources
<!--
SETUP: Ask Claude to populate this list before your first lint session:
"Create lint-status.md. List all sources from wiki/index.md in a randomised order —
not alphabetical, not ingestion order. Label this seed 42. Mark all sources [ ]."
Claude will generate the shuffled list and paste it here. Do not re-shuffle between
sessions — the stable order is what makes the randomisation meaningful.
-->
- [ ] 1. source-slug
- [ ] 2. source-slug
- [ ] 3. source-slug
<!-- ... -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment