Install CUDA deps:
sudo apt-get update
sudo apt-get install libcudnn9-dev-cuda-13
sudo apt-get install libblas-dev liblapack-dev liblapacke-dev
sudo apt-get install libnccl2 libnccl-dev
Install MLX:
Install CUDA deps:
sudo apt-get update
sudo apt-get install libcudnn9-dev-cuda-13
sudo apt-get install libblas-dev liblapack-dev liblapacke-dev
sudo apt-get install libnccl2 libnccl-dev
Install MLX:
The command for evaluating on MMLU Pro:
mlx_lm.evaluate --model model/repo --task mmlu_pro
The command for efficiency benchmarks:
import mlx.core as mx | |
# Possible tile size for tensor cores | |
TS = 32 | |
# Matrix dimension (M = N = K = D) | |
D = 2048 | |
A = mx.random.uniform(shape=(D, D)) | |
B = mx.random.uniform(shape=(D, D)) |
import math | |
import time | |
from functools import partial | |
import mlx.core as mx | |
import mlx.nn as nn | |
import mlx.optimizers as optim | |
import numpy as np | |
from mlx.utils import tree_flatten |
import argparse | |
import copy | |
import mlx.core as mx | |
from pathlib import Path | |
from mlx_lm import load, stream_generate | |
from mlx_lm.generate import generate_step | |
from mlx_lm.models.cache import make_prompt_cache | |
DEFAULT_MAX_TOKENS = 2048 |
You are an agent - please keep going until the user’s query is completely resolved, before ending your turn and yielding back to the user. Only terminate your turn when you are sure that the problem is solved, or if you need more info from the user to solve the problem. If you are not sure about anything pertaining to the user’s request, use your tools to read files and gather the relevant information: do NOT guess or make up an answer. You MUST plan extensively before each function call, and reflect extensively on the outcomes of the previous function calls. DO NOT do this entire process by making function calls only, as this can impair your ability to solve the problem and think insightfully.
First install the dependencies:
pip install mlx-lm openai
Then start the server:
mlx_lm.server
import argparse | |
import math | |
import mlx.core as mx | |
import mlx.nn as nn | |
from tqdm import tqdm | |
from mlx_lm.utils import load | |
from pathlib import Path | |
def eval_ppl(model, data, batch_size=32): |
class GLU: Module, UnaryLayer { | |
let dim: Int | |
init(dim: Int) { | |
self.dim = dim | |
} | |
func callAsFunction(_ x: MLXArray) -> MLXArray { | |
let (a, b) = x.split(axis: dim) | |
return a * MLXNN.sigmoid(b) |