https://quesma.com/blog/qwen-36-is-awesome/

It comes in two variants, a mixture-of-experts model Qwen 3.6 35B A3B, and a dense Qwen 3.6 27B - slower, but more powerful. The one I recommend!

Note: I've found the 35B-A3B model to run at least 2x fast which is nice when you just need quick results for focused tasks. For larger or multi-step processes the 27B model is worth the wait. Here is the local setup guide to get it working with VSCode via Cline extension.

1. Install llama on MacOS

brew install llama.cpp

2. Download and run model

llama-server -hf unsloth/Qwen3.6-27B-MTP-GGUF:Q8_0 \
    --spec-type draft-mtp -ngl 999 -fa on -c 65536 --jinja --port 8080

The context is set to 65k, but you can increase this if you need longer (but slower) sessions.

If you change the port just update the below.

3. Setup Cline:

API Provider: Select OpenAI Compatible.
Base URL: Enter http://127.0.0.1:8080/v1
API Key: Since your local server doesn't use authentication, you can type anything (e.g., local or 123)
Model ID: unsloth/Qwen3.6-27B-MTP-GGUF:Q8_0

If you install a different model you can see the correct name running curl http://127.0.0.1:8080/v1/models

4. Performance

10 t/s on a M1 Max
18 t/s on a M5 Max
15 t/s on Intel Arc B70

xeoncross/qwen3.6-27b-cline-vscode.md

Select an option