Accurate as of May 18, 2026.
Multi-Token Prediction (MTP) uses the model's built-in prediction heads to draft multiple tokens in parallel, then verifies them against the main model. For Qwen3.6, this yields ~1.5β2Γ faster generation with no accuracy loss.
This guide covers the Qwen3.6 27B and Qwen3.6 35B-A3B (MoE) models. As of May 2026, MTP support is merged into llama.cpp β no fork required.