Skip to content

Instantly share code, notes, and snippets.

View smarvr's full-sized avatar

smarvr smarvr

View GitHub Profile
@smarvr
smarvr / Qwen3.5_ttft_tps_warmup_benchmark.md
Created March 10, 2026 04:56
Qwen3.5 Models (0.8B, 2B, 4B, 9B, 27B, 35B A3B) up to 400k Context (TTFT, Tok/s, Warmup/Reply) on a 4090

Transparently; I think i messed something up when offloading - will try again in the future.

Note, side scroll to see other columns

2048

Model TTFT (s) Duration (s) Tokens/s Input (Tokens/Characters) Output Tokens (Total/Limit) Offload Mode VRAM/Memory Used Warm Avg TTFT (s) Warm Avg Tokens/s Warm Followups Status
Qwen3.5-0.8B-Q4_K_M 0.044 0.619 375.926 232/610 216/1,500 GPU 2,319 MiB 0.041 437.758 3/3 OK / KV q8_0
Qwen3.5-0.8B-bf16 0.028 0.849 272.880 232/610 224/1,500 GPU 3,254 MiB 0.040 323.571 3/3 OK / KV q8_0
Qwen3.5-2B-Q4_K_M 0.033 1.069 280.921 232/610 291/1,500 GPU 3,027 MiB 0.051 326.293 3/3 OK / KV q8_0