Skip to content

Instantly share code, notes, and snippets.

View awni's full-sized avatar

Awni Hannun awni

View GitHub Profile
@awni
awni / README.md
Last active February 19, 2025 02:27
Test Time Scaling with R1-based Models and MLX LM

Test Time Scaling with MLX LM and R1-based LLMs

Install MLX LM:

pip install mlx-lm

And run:

@awni
awni / mlx_distributed_deepseek.md
Last active February 26, 2025 14:56
Run DeepSeek R1 or V3 with MLX Distributed

Setup

On every machine in the cluster install openmpi and mlx-lm:

conda install conda-forge::openmpi
pip install -U mlx-lm

Next download the pipeline parallel run script. Download it to the same path on every machine:

@awni
awni / CMakeLists.txt
Last active January 24, 2025 14:39
Minimal MLX CMake
cmake_minimum_required(VERSION 3.27)
project(example LANGUAGES CXX)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
find_package(
Python 3.9
COMPONENTS Interpreter Development.Module
@awni
awni / cpu_quantize.py
Created October 17, 2024 15:57
Faster CPU HF to MLX conversion script
import argparse
from functools import partial
import multiprocessing as mp
from typing import Callable, Optional
import mlx.core as mx
import mlx.nn as nn
from mlx.utils import tree_map_with_path
from mlx_lm.utils import *
@awni
awni / llms_on_ios.md
Last active February 19, 2025 15:47
A step-by-step guide to run an LLM on an iPhone with MLX Swift
@awni
awni / MLX_0_20_0.pdf
Last active December 15, 2024 02:28
MLX Documentation PDF Versions
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@awni
awni / resnet_mlx.py
Created September 7, 2024 20:02
MLX ResNet18 Inference Benchmark
from huggingface_hub import snapshot_download
import mlx.core as mx
import mlx.nn as nn
import time
class Block(nn.Module):
def __init__(self, in_dims, dims, stride=1):
super().__init__()
@awni
awni / fast_conway_mlx.py
Last active February 7, 2025 21:39
Conway's Game of Life Accelerated with Custom Kernels in MLX
import numpy as np
import mlx.core as mx
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
import tqdm
def conway(a: mx.array):
source = """
@awni
awni / mlx_api_prompt.py
Created August 20, 2024 15:43
Meta Llama 3.1 with MLX LM and the MLX Python API as Context
import os
import mlx.core as mx
from mlx_lm import load, generate
filename = os.path.join(os.path.dirname(mx.__file__), "core/__init__.pyi")
with open(filename, 'r') as fid:
prompt = fid.read()
prompt += "\nHow do you write a self-attention layer using the above API in MLX?"
model, tokenizer = load("mlx-community/meta-Llama-3.1-8B-Instruct-4bit")

Setup the repo

git clone [email protected]:filipstrand/mflux.git
cd mflux && pip install -r requirements.txt

Make a run script

Name this anything, maybe flux.py. Make sure to update the two paths marked below.