Understanding SGLang's Radix Cache, the LeetCode Way

Overview

What is Radix Cache?

When an LLM processes a prompt, it computes a Key and Value vector for every token — the KV cache. If many requests share the same system prompt, recomputing its KV cache from scratch each time is wasteful. Radix Cache stores these computed prefixes in a Radix Tree and reuses them across requests, which is one of the main reasons SGLang achieves high throughput.

yangyonggit

Understanding SGLang's Radix Cache, the LeetCode Way

Overview

What is Radix Cache?

Why Read mini-sglang Instead of SGLang Directly?