smarvr smarvr

These are not the repos you're looking for...

smarvr / Qwen3.5_ttft_tps_warmup_benchmark.md

Created March 10, 2026 04:56

Qwen3.5 Models (0.8B, 2B, 4B, 9B, 27B, 35B A3B) up to 400k Context (TTFT, Tok/s, Warmup/Reply) on a 4090

Transparently; I think i messed something up when offloading - will try again in the future.

Note, side scroll to see other columns

2048

Model	TTFT (s)	Duration (s)	Tokens/s	Input (Tokens/Characters)	Output Tokens (Total/Limit)	Offload Mode	VRAM/Memory Used	Warm Avg TTFT (s)	Warm Avg Tokens/s	Warm Followups	Status
Qwen3.5-0.8B-Q4_K_M	0.044	0.619	375.926	232/610	216/1,500	GPU	2,319 MiB	0.041	437.758	3/3	OK / KV q8_0
Qwen3.5-0.8B-bf16	0.028	0.849	272.880	232/610	224/1,500	GPU	3,254 MiB	0.040	323.571	3/3	OK / KV q8_0
Qwen3.5-2B-Q4_K_M	0.033	1.069	280.921	232/610	291/1,500	GPU	3,027 MiB	0.051	326.293	3/3	OK / KV q8_0