Hyun Yi hyuunnn

LLM Wiki v2

A pattern for building personal knowledge bases using LLMs. Extended with lessons from building agentmemory, a persistent memory engine for AI coding agents.

This builds on Andrej Karpathy's original LLM Wiki idea file. Everything in the original still applies. This document adds what we learned running the pattern in production: what breaks at scale, what's missing, and what separates a wiki that stays useful from one that rots.

What the original gets right

The core insight is correct: stop re-deriving, start compiling. RAG retrieves and forgets. A wiki accumulates and compounds. The three-layer architecture (raw sources, wiki, schema) works. The operations (ingest, query, lint) cover the basics. If you haven't read the original, start there.

LLM Wiki

A pattern for building personal knowledge bases using LLMs.

This is an idea file, it is designed to be copy pasted to your own LLM Agent (e.g. OpenAI Codex, Claude Code, OpenCode / Pi, or etc.). Its goal is to communicate the high level idea, but your agent will build out the specifics in collaboration with you.

The core idea

Most people's experience with LLMs and documents looks like RAG: you upload a collection of files, the LLM retrieves relevant chunks at query time, and generates an answer. This works, but the LLM is rediscovering knowledge from scratch on every question. There's no accumulation. Ask a subtle question that requires synthesizing five documents, and the LLM has to find and piece together the relevant fragments every time. Nothing is built up. NotebookLM, ChatGPT file uploads, and most RAG systems work this way.

Software Development Principles by Masters

A curated collection of timeless software development principles from industry legends.

Which GGUF is right for me? (Opinionated)

Good question! I am collecting human data on how quantization affects outputs. See here for more information: ggml-org/llama.cpp#5962

In the meantime, use the largest that fully fits in your GPU. If you can comfortably fit Q4_K_S, try using a model with more parameters.

llama.cpp feature matrix

See the wiki upstream: https://github.com/ggerganov/llama.cpp/wiki/Feature-matrix

Hack.lu CTF 2023 - Safest Eval (Python jail escape)

Challenge by: realansgar
Writeup by: rebane2001

Overview

The challenge consists of a simple Flask webapp that lets you eval arbitrary Python code in a jail in order to evaluate your solution to a leetcode-style programming challenge. The flag can be retrieved by running the /readflag setuid program. The source code was provided.

	CVE-2025-43520 - DarkSword

	1. cluster_read_ext and cluster_write_ext call cluster_io_type to determine what IO operation to perform
	2. cluster_io_type calls vm_map_get_upl with UPL_QUERY_OBJECT_TYPE to query type of the vm_object that backs the user-supplied virtual address range
	3. If this object is physically contiguous it returns IO_CONTIG, otherwise it returns IO_DIRECT or IO_COPY
	4. If cluster_io_type returns IO_CONTIG, cluster_[read\|write]_ext will call the "contig" variant, cluster_[read\|write]_contig
	5. cluster_[read\|write]_contig then calls vm_map_get_upl a second time to get the UPL from the uio
	6. It then grabs the first physical page from the UPL using upl_phys_page and performs a physical copy
	7. This is a TOCTOU. An attacker can remap the virtual address range so that the region is no longer physically contiguous after the first call to vm_map_get_upl, causing an OOBR/OOBW to physmem

	"""
	The most atomic way to train and run inference for a GPT in pure, dependency-free Python.
	This file is the complete algorithm.
	Everything else is just efficiency.

	@karpathy
	"""

	import os # os.path.exists
	import math # math.log, math.exp

	You are ChatGPT, a large language model based on the GPT-5 model and trained by OpenAI.
	Knowledge cutoff: 2024-06
	Current date: 2025-08-08

	Image input capabilities: Enabled
	Personality: v2
	Do not reproduce song lyrics or any other copyrighted material, even if asked.
	You're an insightful, encouraging assistant who combines meticulous clarity with genuine enthusiasm and gentle humor.
	Supportive thoroughness: Patiently explain complex topics clearly and comprehensively.
	Lighthearted interactions: Maintain friendly tone with subtle humor and warmth.