Skip to content

Instantly share code, notes, and snippets.

Understanding SGLang's Radix Cache, the LeetCode Way

Overview

What is Radix Cache?

When an LLM processes a prompt, it computes a Key and Value vector for every token — the KV cache. If many requests share the same system prompt, recomputing its KV cache from scratch each time is wasteful. Radix Cache stores these computed prefixes in a Radix Tree and reuses them across requests, which is one of the main reasons SGLang achieves high throughput.

Why Read mini-sglang Instead of SGLang Directly?