awni/mlx_lm_openai.md

Last active February 26, 2025 01:36

Star (5) You must be signed in to star a gist
Fork (3) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/awni/a4613c7b25ec0f7c2935d511cf579c07.js"></script>
Save awni/a4613c7b25ec0f7c2935d511cf579c07 to your computer and use it in GitHub Desktop.

Download ZIP

Raw

mlx_lm_openai.md

MLX LM with the OpenAI Python Package

1. Install

Install MLX LM and openai:

pip install mlx-lm openai

2. Run the MLX LM server

Run the MLX LM server with:

mlx_lm.server

3. Make the HTTP request:

Make a Python script (like test.py). And include the following:

import openai

openai_client = openai.OpenAI(
  api_key="placeholder-api", base_url="http://localhost:8080"
)

response = openai_client.chat.completions.create(
    model='mlx-community/Meta-Llama-3-8B-Instruct-4bit',
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": f"Say this is a test!"},
    ],
)

# Process response.

Run the script python test.py.

madroidmaq commented Jul 8, 2024

4. Curl

curl http://localhost:8080/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "Qwen/Qwen2-0.5B-Instruct-MLX",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Hello!"
            }
        ]
    }'

Response:

{
	"id": "chatcmpl-fab9cd3b-c56e-4ec8-9d74-b11418ed04f8",
	"system_fingerprint": "fp_7f85614c-8626-45bd-9e37-570fd21e4062",
	"object": "chat.completions",
	"model": "Qwen/Qwen2-0.5B-Instruct-MLX",
	"created": 1720402910,
	"choices": [{
		"index": 0,
		"logprobs": {
			"token_logprobs": [-0.015625, 0.0, -0.015625, -0.015625, 0.0, -0.390625, 0.0, 0.0, 0.0, 0.0],
			"top_logprobs": [],
			"tokens": [9707, 0, 2585, 646, 358, 7789, 498, 3351, 30, 151645]
		},
		"finish_reason": "stop",
		"message": {
			"role": "assistant",
			"content": "Hello! How can I assist you today?"
		}
	}],
	"usage": {
		"prompt_tokens": 21,
		"completion_tokens": 10,
		"total_tokens": 31
	}
}