https://quesma.com/blog/qwen-36-is-awesome/
It comes in two variants, a mixture-of-experts model Qwen 3.6 35B A3B, and a dense Qwen 3.6 27B - slower, but more powerful. The one I recommend!
Note: I've found the 35B-A3B model to run at least 2x fast which is nice when you just need quick results for focused tasks. For larger or multi-step processes the 27B model is worth the wait. Here is the local setup guide to get it working with VSCode via Cline extension.
brew install llama.cpp
llama-server -hf unsloth/Qwen3.6-27B-MTP-GGUF:Q8_0 \
--spec-type draft-mtp -ngl 999 -fa on -c 65536 --jinja --port 8080
The context is set to 65k, but you can increase this if you need longer (but slower) sessions.
If you change the port just update the below.
- API Provider: Select OpenAI Compatible.
- Base URL: Enter
http://127.0.0.1:8080/v1 - API Key: Since your local server doesn't use authentication, you can type anything (e.g., local or 123)
- Model ID:
unsloth/Qwen3.6-27B-MTP-GGUF:Q8_0
If you install a different model you can see the correct name running curl http://127.0.0.1:8080/v1/models
- 10 t/s on a M1 Max
- 18 t/s on a M5 Max
- 15 t/s on Intel Arc B70