Skip to content

Instantly share code, notes, and snippets.

@xeoncross
Last active June 29, 2026 20:49
Show Gist options
  • Select an option

  • Save xeoncross/6798e198e722687ea75eba2ed451d9da to your computer and use it in GitHub Desktop.

Select an option

Save xeoncross/6798e198e722687ea75eba2ed451d9da to your computer and use it in GitHub Desktop.
Best local LLM model for VSCode and general programming / coding on a macbook

https://quesma.com/blog/qwen-36-is-awesome/

It comes in two variants, a mixture-of-experts model Qwen 3.6 35B A3B, and a dense Qwen 3.6 27B - slower, but more powerful. The one I recommend!

Note: I've found the 35B-A3B model to run at least 2x fast which is nice when you just need quick results for focused tasks. For larger or multi-step processes the 27B model is worth the wait. Here is the local setup guide to get it working with VSCode via Cline extension.

1. Install llama on MacOS

brew install llama.cpp

2. Download and run model

llama-server -hf unsloth/Qwen3.6-27B-MTP-GGUF:Q8_0 \
    --spec-type draft-mtp -ngl 999 -fa on -c 65536 --jinja --port 8080

The context is set to 65k, but you can increase this if you need longer (but slower) sessions.

If you change the port just update the below.

3. Setup Cline:

  • API Provider: Select OpenAI Compatible.
  • Base URL: Enter http://127.0.0.1:8080/v1
  • API Key: Since your local server doesn't use authentication, you can type anything (e.g., local or 123)
  • Model ID: unsloth/Qwen3.6-27B-MTP-GGUF:Q8_0

If you install a different model you can see the correct name running curl http://127.0.0.1:8080/v1/models

4. Performance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment