Transparently; I think i messed something up when offloading - will try again in the future.
Note, side scroll to see other columns
| Model | TTFT (s) | Duration (s) | Tokens/s | Input (Tokens/Characters) | Output Tokens (Total/Limit) | Offload Mode | VRAM/Memory Used | Warm Avg TTFT (s) | Warm Avg Tokens/s | Warm Followups | Status |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Qwen3.5-0.8B-Q4_K_M | 0.044 | 0.619 | 375.926 | 232/610 | 216/1,500 | GPU | 2,319 MiB | 0.041 | 437.758 | 3/3 | OK / KV q8_0 |
| Qwen3.5-0.8B-bf16 | 0.028 | 0.849 | 272.880 | 232/610 | 224/1,500 | GPU | 3,254 MiB | 0.040 | 323.571 | 3/3 | OK / KV q8_0 |
| Qwen3.5-2B-Q4_K_M | 0.033 | 1.069 | 280.921 | 232/610 | 291/1,500 | GPU | 3,027 MiB | 0.051 | 326.293 | 3/3 | OK / KV q8_0 |