Skip to content

Instantly share code, notes, and snippets.

View awni's full-sized avatar

Awni Hannun awni

View GitHub Profile
@awni
awni / l3min.py
Last active January 25, 2025 21:30
A minimal, fast implementation of Llama 3.1 in MLX.
"""
A minimal, fast example generating text with Llama 3.1 in MLX.
To run, install the requirements:
pip install -U mlx transformers fire
Then generate text with:
python l3min.py "How tall is K2?"
@awni
awni / metal_in_python.py
Last active August 12, 2024 20:56
Compile and call a Metal GPU kernel from Python
# Requires:
# pip install pyobjc-framework-Metal
import numpy as np
import Metal
# Get the default GPU device
device = Metal.MTLCreateSystemDefaultDevice()
# Make a command queue to encode command buffers to
command_queue = device.newCommandQueue()

Use Lazy Loading to Reduce Peak Memory Use

Recall, MLX is lazy. No actual computation happens until you explicitly or implicitly evaluate the graph. Even loading arrays from a file is lazy:

weights = mx.load("model.safetensors")

Avoid Overly Frequent Graph Evaluations

MLX is lazy. No actual computation happens until you explicitly or implicitly evaluate the graph. Here are some ways that can happen:

  • Explicit call to mx.eval
  • Call a.item() on a scalar array
  • Convert an array to NumPy, i.e. np.array(a)
  • Print an array
from typing import Callable, Tuple
import operator
from functools import reduce
from itertools import product
import mlx.core as mx
def _interpolate(
x: mx.array, scale_factor: Tuple, indices_fn: Callable, align_corners: bool = False
):

MLX LM with the OpenAI Python Package

1. Install

Install MLX LM and openai:

pip install mlx-lm openai
@awni
awni / ops_data_dependent_shapes.md
Last active March 13, 2025 20:40
Working around operations with data-dependent shapes in MLX

Ops with Data Dependent Shapes

This is a short article on a common type of not-yet-supported operation in MLX: ops where the output shape depends on the input data. Here's an outline:

  1. An introduction to these operations, followed by an explanation of why they are challenging to implement efficiently.
  2. A discussion on when and how to work-around these missing operations with a couple of examples.

The Ops

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@awni
awni / run.py
Created March 29, 2024 02:52
Benchmark Mistral Graph Construction
import time
import mlx.core as mx
import mlx.nn as nn
from dataclasses import dataclass
from typing import Dict, Optional, Tuple, Union
@dataclass
class ModelArgs:
@awni
awni / interspeech_2021_tutorial.ipynb
Created August 28, 2021 15:31
interspeech_2021_tutorial.ipynb
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.