On every machine in the cluster install openmpi and mlx-lm:
conda install conda-forge::openmpi
pip install -U mlx-lmNext download the pipeline parallel run script. Download it to the same path on every machine:
curl -O https://raw.githubusercontent.com/ml-explore/mlx-examples/refs/heads/main/llms/mlx_lm/examples/pipeline_generate.pyMake a hosts.json file on the machine you plan to launch the generation. For two machines it should look like this:
[
{"ssh": "hostname1"},
{"ssh": "hostname2"}
]
Also make sure you can ssh hostname from every machine to every other machine. Check-out the MLX documentation for more information on setting up and testing MPI.
Set the wired limit on the machines to use more memory. For example on a 192GB M2 Ultra set this:
sudo sysctl iogpu.wired_limit_mb=180000Run the generation with a command like the following:
mlx.launch \
--hostfile path/to/hosts.json \
--backend mpi \
path/to/pipeline_generate.py \
--prompt "What number is larger 6.9 or 6.11?" \
--max-tokens 128 \
--model mlx-community/DeepSeek-R1-4bit
For DeepSeek R1 quantized in 3-bit you need in aggregate 350GB of RAM accross the cluster of machines, e.g. two 192 GB M2 Ultras. To run the model quantized to 4-bit you need 450GB in aggregate RAM or three 192 GB M2 Ultras.


Yes, I have set up sysctl for both devices, which allows me to control remote device downloads of models. However, downloads usually encounter errors and display network issues. I am in China, is it due to regional restrictions?
`mpirun -np 3 --hostfile hosts.txt python3 pipeline_generate.py --prompt "What number is larger 6.9 or 6.11?" --model mlx-community/DeepSeek-R1-3bit
/Users/zhangchi/Library/Python/3.9/lib/python/site-packages/urllib3/init.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: urllib3/urllib3#3020
warnings.warn(
/Users/zhangchi/Library/Python/3.9/lib/python/site-packages/urllib3/init.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: urllib3/urllib3#3020
warnings.warn(
Fetching 5 files: 100%|██████████| 5/5 [00:00<00:00, 22525.80it/s]
Fetching 5 files: 100%|██████████| 5/5 [00:00<00:00, 46707.17it/s]
Fetching 70 files: 17%|█▋ | 12/70 [06:54<37:14, 38.52s/it]--------------------------------------------------------------------------
PRTE has lost communication with a remote daemon.
HNP daemon : [prterun-Mac-Studio-6-41295@0,0] on node Mac-Studio-6
Remote daemon: [prterun-Mac-Studio-6-41295@0,1] on node Mac-Studio-8
This is usually due to either a failure of the TCP network
connection to the node, or possibly an internal failure of
the daemon itself. We cannot recover from this failure, and
therefore will terminate the job.
--------------------------------------------------------------------------`