Skip to content

Instantly share code, notes, and snippets.

View awni's full-sized avatar

Awni Hannun awni

View GitHub Profile
@awni
awni / PROMPT.md
Last active June 25, 2025 14:58
MLX LM with Tiny Agents

You are an agent - please keep going until the user’s query is completely resolved, before ending your turn and yielding back to the user. Only terminate your turn when you are sure that the problem is solved, or if you need more info from the user to solve the problem. If you are not sure about anything pertaining to the user’s request, use your tools to read files and gather the relevant information: do NOT guess or make up an answer. You MUST plan extensively before each function call, and reflect extensively on the outcomes of the previous function calls. DO NOT do this entire process by making function calls only, as this can impair your ability to solve the problem and think insightfully.

@awni
awni / mlx_lm_openai.md
Created June 7, 2025 19:25
MLX LM + OpenAI Client

First install the dependencies:

pip install mlx-lm openai

Then start the server:

mlx_lm.server
import argparse
import math
import mlx.core as mx
import mlx.nn as nn
from tqdm import tqdm
from mlx_lm.utils import load
from pathlib import Path
def eval_ppl(model, data, batch_size=32):
class GLU: Module, UnaryLayer {
let dim: Int
init(dim: Int) {
self.dim = dim
}
func callAsFunction(_ x: MLXArray) -> MLXArray {
let (a, b) = x.split(axis: dim)
return a * MLXNN.sigmoid(b)
@awni
awni / mlx_lm_open_webui.md
Created April 25, 2025 15:41
Open WebUI with MLX LM

Setup

Install packages:

pip install open-webui mlx-lm

Start Open WebUI server:

@awni
awni / README.md
Last active April 30, 2025 12:30
Test Time Scaling with R1-based Models and MLX LM

Test Time Scaling with MLX LM and R1-based LLMs

Install MLX LM:

pip install mlx-lm

And run:

@awni
awni / mlx_distributed_deepseek.md
Last active June 24, 2025 10:50
Run DeepSeek R1 or V3 with MLX Distributed

Setup

On every machine in the cluster install openmpi and mlx-lm:

conda install conda-forge::openmpi
pip install -U mlx-lm

Next download the pipeline parallel run script. Download it to the same path on every machine:

@awni
awni / CMakeLists.txt
Last active January 24, 2025 14:39
Minimal MLX CMake
cmake_minimum_required(VERSION 3.27)
project(example LANGUAGES CXX)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
find_package(
Python 3.9
COMPONENTS Interpreter Development.Module
@awni
awni / cpu_quantize.py
Created October 17, 2024 15:57
Faster CPU HF to MLX conversion script
import argparse
from functools import partial
import multiprocessing as mp
from typing import Callable, Optional
import mlx.core as mx
import mlx.nn as nn
from mlx.utils import tree_map_with_path
from mlx_lm.utils import *
@awni
awni / llms_on_ios.md
Last active June 21, 2025 11:25
A step-by-step guide to run an LLM on an iPhone with MLX Swift