Skip to content

Instantly share code, notes, and snippets.

View awni's full-sized avatar

Awni Hannun awni

View GitHub Profile

Avoid Overly Frequent Graph Evaluations

MLX is lazy. No actual computation happens until you explicitly or implicitly evaluate the graph. Here are some ways that can happen:

  • Explicit call to mx.eval
  • Call a.item() on a scalar array
  • Convert an array to NumPy, i.e. np.array(a)
  • Print an array

Use Lazy Loading to Reduce Peak Memory Use

Recall, MLX is lazy. No actual computation happens until you explicitly or implicitly evaluate the graph. Even loading arrays from a file is lazy:

weights = mx.load("model.safetensors")
@awni
awni / metal_in_python.py
Last active August 12, 2024 20:56
Compile and call a Metal GPU kernel from Python
# Requires:
# pip install pyobjc-framework-Metal
import numpy as np
import Metal
# Get the default GPU device
device = Metal.MTLCreateSystemDefaultDevice()
# Make a command queue to encode command buffers to
command_queue = device.newCommandQueue()
@awni
awni / l3min.py
Last active January 25, 2025 21:30
A minimal, fast implementation of Llama 3.1 in MLX.
"""
A minimal, fast example generating text with Llama 3.1 in MLX.
To run, install the requirements:
pip install -U mlx transformers fire
Then generate text with:
python l3min.py "How tall is K2?"

Setup the repo

git clone [email protected]:filipstrand/mflux.git
cd mflux && pip install -r requirements.txt

Make a run script

Name this anything, maybe flux.py. Make sure to update the two paths marked below.

@awni
awni / mlx_api_prompt.py
Created August 20, 2024 15:43
Meta Llama 3.1 with MLX LM and the MLX Python API as Context
import os
import mlx.core as mx
from mlx_lm import load, generate
filename = os.path.join(os.path.dirname(mx.__file__), "core/__init__.pyi")
with open(filename, 'r') as fid:
prompt = fid.read()
prompt += "\nHow do you write a self-attention layer using the above API in MLX?"
model, tokenizer = load("mlx-community/meta-Llama-3.1-8B-Instruct-4bit")
@awni
awni / fast_conway_mlx.py
Last active February 7, 2025 21:39
Conway's Game of Life Accelerated with Custom Kernels in MLX
import numpy as np
import mlx.core as mx
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
import tqdm
def conway(a: mx.array):
source = """
@awni
awni / resnet_mlx.py
Created September 7, 2024 20:02
MLX ResNet18 Inference Benchmark
from huggingface_hub import snapshot_download
import mlx.core as mx
import mlx.nn as nn
import time
class Block(nn.Module):
def __init__(self, in_dims, dims, stride=1):
super().__init__()
@awni
awni / MLX_0_20_0.pdf
Last active December 15, 2024 02:28
MLX Documentation PDF Versions
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@awni
awni / llms_on_ios.md
Last active March 14, 2025 11:41
A step-by-step guide to run an LLM on an iPhone with MLX Swift