On every machine in the cluster install openmpi and mlx-lm:
conda install conda-forge::openmpi
pip install -U mlx-lmNext download the pipeline parallel run script. Download it to the same path on every machine:
curl -O https://raw.githubusercontent.com/ml-explore/mlx-examples/refs/heads/main/llms/mlx_lm/examples/pipeline_generate.pyMake a hosts.json file on the machine you plan to launch the generation. For two machines it should look like this:
[
{"ssh": "hostname1"},
{"ssh": "hostname2"}
]
Also make sure you can ssh hostname from every machine to every other machine. Check-out the MLX documentation for more information on setting up and testing MPI.
Set the wired limit on the machines to use more memory. For example on a 192GB M2 Ultra set this:
sudo sysctl iogpu.wired_limit_mb=180000Run the generation with a command like the following:
mlx.launch \
--hostfile path/to/hosts.json \
--backend mpi \
path/to/pipeline_generate.py \
--prompt "What number is larger 6.9 or 6.11?" \
--max-tokens 128 \
--model mlx-community/DeepSeek-R1-4bit
For DeepSeek R1 quantized in 3-bit you need in aggregate 350GB of RAM accross the cluster of machines, e.g. two 192 GB M2 Ultras. To run the model quantized to 4-bit you need 450GB in aggregate RAM or three 192 GB M2 Ultras.


I used two 192G Mac Studio to run the DeepSeeker R1-3bit model, using the following command: "mpirun-np 2-- hostfile hosts. txt python3 pipine_generation. py -- prompt" What number is larger 6.9 or 6.11? "-- model mlx community/DeepSeeker R1-3bit".
![Uploading 截屏2025-03-10 13.54.00.png…]()
I have limited the memory usage limit on both devices, but the remote device crashed due to high memory usage. How can I solve this problem?