Skip to content

Instantly share code, notes, and snippets.

View awni's full-sized avatar

Awni Hannun awni

View GitHub Profile
@awni
awni / CMakeLists.txt
Last active October 23, 2024 22:00
Minimal MLX CMake
cmake_minimum_required(VERSION 3.27)
project(_ext LANGUAGES CXX)
# ----------------------------- Setup -----------------------------
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_POSITION_INDEPENDENT_CODE ON)
option(BUILD_SHARED_LIBS "Build as a shared library" ON)
@awni
awni / cpu_quantize.py
Created October 17, 2024 15:57
Faster CPU HF to MLX conversion script
import argparse
from functools import partial
import multiprocessing as mp
from typing import Callable, Optional
import mlx.core as mx
import mlx.nn as nn
from mlx.utils import tree_map_with_path
from mlx_lm.utils import *
@awni
awni / MLX_0_17_3.pdf
Last active September 26, 2024 11:32
MLX Documentation PDF Versions
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@awni
awni / resnet_mlx.py
Created September 7, 2024 20:02
MLX ResNet18 Inference Benchmark
from huggingface_hub import snapshot_download
import mlx.core as mx
import mlx.nn as nn
import time
class Block(nn.Module):
def __init__(self, in_dims, dims, stride=1):
super().__init__()
@awni
awni / fast_conway_mlx.py
Last active October 4, 2024 04:33
Conway's Game of Life Accelerated with Custom Kernels in MLX
import numpy as np
import mlx.core as mx
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
import tqdm
def conway(a: mx.array):
source = """
@awni
awni / mlx_api_prompt.py
Created August 20, 2024 15:43
Meta Llama 3.1 with MLX LM and the MLX Python API as Context
import os
import mlx.core as mx
from mlx_lm import load, generate
filename = os.path.join(os.path.dirname(mx.__file__), "core/__init__.pyi")
with open(filename, 'r') as fid:
prompt = fid.read()
prompt += "\nHow do you write a self-attention layer using the above API in MLX?"
model, tokenizer = load("mlx-community/meta-Llama-3.1-8B-Instruct-4bit")

Setup the repo

git clone [email protected]:filipstrand/mflux.git
cd mflux && pip install -r requirements.txt

Make a run script

Name this anything, maybe flux.py. Make sure to update the two paths marked below.

@awni
awni / l3min.py
Last active November 2, 2024 16:06
A minimal, fast implementation of Llama 3.1 in MLX.
"""
A minimal, fast example generating text with Llama 3.1 in MLX.
To run, install the requirements:
pip install -U mlx transformers fire
Then generate text with:
python l3min.py "How tall is K2?"
@awni
awni / metal_in_python.py
Last active August 12, 2024 20:56
Compile and call a Metal GPU kernel from Python
# Requires:
# pip install pyobjc-framework-Metal
import numpy as np
import Metal
# Get the default GPU device
device = Metal.MTLCreateSystemDefaultDevice()
# Make a command queue to encode command buffers to
command_queue = device.newCommandQueue()

Use Lazy Loading to Reduce Peak Memory Use

Recall, MLX is lazy. No actual computation happens until you explicitly or implicitly evaluate the graph. Even loading arrays from a file is lazy:

weights = mx.load("model.safetensors")