Awni Hannun awni

Install CUDA deps:

sudo apt-get update
sudo apt-get install libcudnn9-dev-cuda-13
sudo apt-get install libblas-dev liblapack-dev liblapacke-dev
sudo apt-get install libnccl2 libnccl-dev

Install MLX:

Benchmarks for mlx-lm

The command for evaluating on MMLU Pro:

mlx_lm.evaluate --model model/repo --task mmlu_pro

The command for efficiency benchmarks:

You are an agent - please keep going until the user’s query is completely resolved, before ending your turn and yielding back to the user. Only terminate your turn when you are sure that the problem is solved, or if you need more info from the user to solve the problem. If you are not sure about anything pertaining to the user’s request, use your tools to read files and gather the relevant information: do NOT guess or make up an answer. You MUST plan extensively before each function call, and reflect extensively on the outcomes of the previous function calls. DO NOT do this entire process by making function calls only, as this can impair your ability to solve the problem and think insightfully.

First install the dependencies:

pip install mlx-lm openai

Then start the server:

mlx_lm.server

Setup

Install packages:

pip install open-webui mlx-lm

Start Open WebUI server:

	import mlx.core as mx

	# Possible tile size for tensor cores
	TS = 32

	# Matrix dimension (M = N = K = D)
	D = 2048

	A = mx.random.uniform(shape=(D, D))
	B = mx.random.uniform(shape=(D, D))


	import math
	import time
	from functools import partial

	import mlx.core as mx
	import mlx.nn as nn
	import mlx.optimizers as optim
	import numpy as np
	from mlx.utils import tree_flatten

	import argparse
	import copy
	import mlx.core as mx
	from pathlib import Path

	from mlx_lm import load, stream_generate
	from mlx_lm.generate import generate_step
	from mlx_lm.models.cache import make_prompt_cache

	DEFAULT_MAX_TOKENS = 2048

	import argparse
	import math
	import mlx.core as mx
	import mlx.nn as nn
	from tqdm import tqdm
	from mlx_lm.utils import load
	from pathlib import Path


	def eval_ppl(model, data, batch_size=32):

	class GLU: Module, UnaryLayer {
	let dim: Int

	init(dim: Int) {
	self.dim = dim
	}

	func callAsFunction(_ x: MLXArray) -> MLXArray {
	let (a, b) = x.split(axis: dim)
	return a * MLXNN.sigmoid(b)