Skip to content

Instantly share code, notes, and snippets.

View w32zhong's full-sized avatar
⛹️
Trying to keep up.

Wei w32zhong

⛹️
Trying to keep up.
View GitHub Profile

Understanding SGLang's Radix Cache, the LeetCode Way

Overview

What is Radix Cache?

When an LLM processes a prompt, it computes a Key and Value vector for every token — the KV cache. If many requests share the same system prompt, recomputing its KV cache from scratch each time is wasteful. Radix Cache stores these computed prefixes in a Radix Tree and reuses them across requests, which is one of the main reasons SGLang achieves high throughput.

Why Read mini-sglang Instead of SGLang Directly?

@w32zhong
w32zhong / HAProxy
Last active May 26, 2026 21:07
PVE scripts
global
daemon
maxconn 10000
defaults
mode tcp
timeout connect 5s
timeout client 1h
timeout server 1h
[package]
name = "chinese_search"
version = "0.1.0"
edition = "2024"
[dependencies]
jieba-rs = "0.9.0"
tantivy = "0.26.0"
tantivy-jieba = "0.19.0"
@w32zhong
w32zhong / mTLS-notes.md
Last active March 28, 2026 17:38
Website and mTLS certs

Gateway

app/gateway/discovery.js

image

app/gateway/apisix.yml

image

app/gateway/entrypoint.sh

@w32zhong
w32zhong / README_hfd.md
Created December 5, 2025 02:32 — forked from padeoe/README_hfd.md
CLI-Tool for download Huggingface models and datasets with aria2/wget: hfd

🤗Huggingface Model Downloader

Note

(2025-01-08) Add feature for 🏷️Tag(Revision) Selection, contributed by @Bamboo-D.
(2024-12-17) Add feature for ⚡Quick Startup and ⏭️Fast Resume, enabling skipping of downloaded files, while removing the git clone dependency to accelerate file list retrieval.

Considering the lack of multi-threaded download support in the official huggingface-cli, and the inadequate error handling in hf_transfer, This command-line tool leverages curl and aria2c for fast and robust downloading of models and datasets.

Features

  • ⏯️ Resume from breakpoint: You can re-run it or Ctrl+C anytime.
@w32zhong
w32zhong / config.json
Last active September 28, 2025 17:03
sglang
{
"architectures": [
"Qwen3ForCausalLMEagle"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151645,
"head_dim": 128,
"hidden_act": "silu",
@w32zhong
w32zhong / modular-max.py
Last active July 14, 2025 17:35
vllm compared to nano-vllm
## Setup
# conda create -n modular python=3.11
# uv pip install modular --extra-index-url https://download.pytorch.org/whl/cpu --index-url https://dl.modular.com/public/nightly/python/simple/ --index-strategy unsafe-best-match --prerelease allow
# conda install -c conda-forge gcc=12.1.0
model_path = 'Qwen/Qwen2.5-0.5B'
import time
from max.entrypoints.llm import LLM
from max.pipelines import PipelineConfig
@w32zhong
w32zhong / code.js
Created June 4, 2025 13:07 — forked from iiLaurens/code.js
Get all clickable elements on a page
window.scrollTo(0, 0)
var bodyRect = document.body.getBoundingClientRect();
var items = Array.prototype.slice.call(
document.querySelectorAll('*')
).map(function(element) {
var rect=element.getBoundingClientRect();
return {
element: element,
include: (element.tagName === "BUTTON" || element.tagName === "A" || (element.onclick != null) || window.getComputedStyle(element).cursor == "pointer"),
@w32zhong
w32zhong / train.py
Created May 17, 2025 19:42 — forked from ddh0/train.py
Janky pretraining script for small llama models using HF fineweb - modify according to your needs
import os
import torch
import psutil
import datasets
import glob
from transformers import (
AutoTokenizer, LlamaConfig, LlamaForCausalLM, Trainer, TrainingArguments,
DataCollatorForLanguageModeling
)
@w32zhong
w32zhong / Dockerfile
Last active February 28, 2025 16:34
Example dockerfile
FROM nvcr.io/nvidia/pytorch:23.11-py3
WORKDIR /workspace
RUN pip install -r r1.txt
ADD requirements.txt r2.txt
# FlashAttention-2 compatibility copied from https://github.com/Dao-AILab/flash-attention/issues/836#issuecomment-1951433985
RUN pip install flash-attn==2.5.1.post1
RUN apt update && apt install -y tmux git-lfs
RUN pip install nvitop
ADD . myproject
WORKDIR /workspace/myproject