Skip to content

Instantly share code, notes, and snippets.

@WolframRavenwolf
Last active August 3, 2025 02:11
Show Gist options
  • Save WolframRavenwolf/0ee85a65b10e1a442e4bf65f848d6b01 to your computer and use it in GitHub Desktop.
Save WolframRavenwolf/0ee85a65b10e1a442e4bf65f848d6b01 to your computer and use it in GitHub Desktop.
HOWTO: Use Qwen3-Coder (or any other LLM) with Claude Code (via LiteLLM)

Here's a simple way for Claude Code users to switch from the costly Claude models to the newly released SOTA open-source/weights coding model, Qwen3-Coder, via OpenRouter using LiteLLM on your local machine.

This process is quite universal and can be easily adapted to suit your needs. Feel free to explore other models (including local ones) as well as different providers and coding agents.

I'm sharing what works for me. This guide is set up so you can just copy and paste the commands into your terminal.

1. Create the LiteLLM directory and enter it (we'll create the necessary files ourselves, so there's no need to clone the repo):

#git clone --depth 1 https://github.com/BerriAI/litellm.git
mkdir -p litellm && cd litellm

Note

This guide previously required manually building the LiteLLM container image due to missing updates in the official online version. Now that these updates have been integrated and the container image is current, manual building is no longer necessary. That's why we no longer need to download the repository; instead, we'll create the necessary files ourselves, based on the original files from the repository.

2. Create an .env file with your OpenRouter API key (make sure to insert your own API key!):

cat <<EOF >.env
LITELLM_MASTER_KEY = "sk-1234"

# OpenRouter
OPENROUTER_API_KEY = "sk-or-v1-…" # 🚩
EOF

3. Create a config.yaml file that replaces Anthropic models with Qwen3-Coder (with all the recommended parameters):

cat <<\EOF >config.yaml
model_list:
  - model_name: "anthropic/*"
    litellm_params:
      model: "openrouter/qwen/qwen3-coder" # Qwen/Qwen3-Coder-480B-A35B-Instruct
      max_tokens: 65536
      repetition_penalty: 1.05
      temperature: 0.7
      top_k: 20
      top_p: 0.8
EOF

4. Create a docker-compose.yml file that loads config.yaml (it's easier to just create a finished one with all the required changes than to edit the original file):

cat <<\EOF >docker-compose.yml
services:
  litellm:
    #build:
    #  context: .
    #  args:
    #    target: runtime
    ############################################################################
    command:
      - "--config=/app/config.yaml"
    container_name: litellm
    hostname: litellm
    image: ghcr.io/berriai/litellm:main-stable
    restart: unless-stopped
    volumes:
      - ./config.yaml:/app/config.yaml
    ############################################################################
    ports:
      - "4000:4000" # Map the container port to the host, change the host port if necessary
    environment:
      DATABASE_URL: "postgresql://llmproxy:dbpassword9090@db:5432/litellm"
      STORE_MODEL_IN_DB: "True" # allows adding models to proxy via UI
    env_file:
      - .env # Load local .env file
    depends_on:
      - db  # Indicates that this service depends on the 'db' service, ensuring 'db' starts first
    healthcheck:  # Defines the health check configuration for the container
      test: [ "CMD-SHELL", "wget --no-verbose --tries=1 http://localhost:4000/health/liveliness || exit 1" ]  # Command to execute for health check
      interval: 30s  # Perform health check every 30 seconds
      timeout: 10s   # Health check command times out after 10 seconds
      retries: 3     # Retry up to 3 times if health check fails
      start_period: 40s  # Wait 40 seconds after container start before beginning health checks

  db:
    image: postgres:16
    restart: always
    container_name: litellm_db
    environment:
      POSTGRES_DB: litellm
      POSTGRES_USER: llmproxy
      POSTGRES_PASSWORD: dbpassword9090
    ports:
      - "5432:5432"
    volumes:
      - postgres_data:/var/lib/postgresql/data # Persists Postgres data across container restarts
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -d litellm -U llmproxy"]
      interval: 1s
      timeout: 5s
      retries: 10

volumes:
  postgres_data:
    name: litellm_postgres_data # Named volume for Postgres data persistence
EOF

5. Run LiteLLM:

docker compose up -d

6. Export environment variables that make Claude Code use Qwen3-Coder via LiteLLM (remember to execute this before starting Claude Code or include it in your shell profile (.zshrc, .bashrc, etc.) for persistence):

export ANTHROPIC_AUTH_TOKEN=sk-1234
export ANTHROPIC_BASE_URL=http://localhost:4000
export ANTHROPIC_MODEL=openrouter/qwen/qwen3-coder
export ANTHROPIC_SMALL_FAST_MODEL=openrouter/qwen/qwen3-coder
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 # Optional: Disables telemetry, error reporting, and auto-updates

7. Start Claude Code and it'll use Qwen3-Coder via OpenRouter instead of the expensive Claude models (you can check with the /model command that it's using a custom model):

claude

8. Optional: Add an alias to your shell profile (.zshrc, .bashrc, etc.) to make it easier to use (e.g. qlaude for "Claude with Qwen"):

alias qlaude='ANTHROPIC_AUTH_TOKEN=sk-1234 ANTHROPIC_BASE_URL=http://localhost:4000 ANTHROPIC_MODEL=openrouter/qwen/qwen3-coder ANTHROPIC_SMALL_FAST_MODEL=openrouter/qwen/qwen3-coder claude'

Have fun and happy coding!

PS: There are other ways to do this using dedicated Claude Code proxies, of which there are quite a few on GitHub. Before implementing this with LiteLLM, I reviewed some of them, but they all had issues, such as not handling the recommended inference parameters. I prefer using established projects with a solid track record and a large user base, which is why I chose LiteLLM. Open Source offers many options, so feel free to explore other projects and find what works best for you.

Tip

  • When using OpenRouter: Head over to OpenRouter's Settings page and set your Allowed Providers to those you prefer, or add any you want to avoid to Ignored Providers. By adding Alibaba to Ignored Providers, you can prevent unexpected costs.

    It's also a good idea to select only one Allowed Provider to test its performance. If it doesn't meet your needs, you can easily switch to another. The default setting lets OpenRouter choose for you, which is convenient, but it may select a suboptimal provider (too expensive, too slow, or lacking features).

  • When using another model: Your model should have a context window of at least 200,000 tokens, as Claude Code likely expects this for its auto-compacting feature when nearing the limit. With a smaller context window, crucial information might scroll out before Claude Code's auto-compact kicks in. To prevent this, regularly run /compact manually to maintain context.

u/krazzmann:

  • I actually installed litellm system wide with uv uv tool installl litellm[proxy]. Then you can also add it to your system init process to start it at boot time.

  • If you want to use the VS Code extension with this Qwen hack, then edit your VS Code settings.json and add:

    "terminal.integrated.env.osx": {
        "ANTHROPIC_API_KEY": "sk-1234",
        "ANTHROPIC_BASE_URL": "http://localhost:4000",
        "ANTHROPIC_MODEL": "openrouter/qwen/qwen3-coder",
        "ANTHROPIC_SMALL_FAST_MODEL": "openrouter/qwen/qwen3-coder",
        "CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1"
    }

terminal.integrated.env.linux or terminal.integrated.env.windows respectively

@makerstorage
Copy link

not worked for me:
API Error (500 {"error":{"message":"Error calling litellm.acompletion for non-Anthropic model: litellm.NotFoundError: NotFoundError: OpenrouterException - {"error":{"message":"No endpoints found that support cache control","code":404}}","type":"None","param":"None","code":"500"}}) · Retrying in 1 seconds… (attempt 1/10)

@WolframRavenwolf
Copy link
Author

not worked for me: API Error (500 {"error":{"message":"Error calling litellm.acompletion for non-Anthropic model: litellm.NotFoundError: NotFoundError: OpenrouterException - {"error":{"message":"No endpoints found that support cache control","code":404}}","type":"None","param":"None","code":"500"}}) · Retrying in 1 seconds… (attempt 1/10)

This issue occurred with older versions of LiteLLM. The latest version fixed it.

If you followed my guide exactly, you should already have the latest version. If you installed LiteLLM differently, please upgrade or follow the guide closely.

@WolframRavenwolf
Copy link
Author

@olafgeibig took my humble foundations and elevated them to new heights - he truly masters the LiteLLM craft. This is the definitive guide on using the three new @QwenLM models with @wandb's new inference service in Claude Code:

https://gist.github.com/olafgeibig/7cdaa4c9405e22dba02dc57ce2c7b31f

@RomeoV
Copy link

RomeoV commented Jul 29, 2025

Does it also have internet access? I have struggled with setting that up when using custom claude code backends...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment