On every machine in the cluster install openmpi and mlx-lm:
conda install conda-forge::openmpi
pip install -U mlx-lmNext download the pipeline parallel run script. Download it to the same path on every machine:
| // You can read the article I wrote for this setup on medium.com at the link below | |
| // https://medium.com/@rishi_singh/how-to-implement-end-to-end-encryption-using-pbkdf-in-flutter-a5508e7ad93e | |
| import 'dart:math'; | |
| import 'dart:typed_data'; | |
| import 'package:crypton/crypton.dart'; | |
| import 'package:pointycastle/block/aes.dart'; | |
| import 'package:pointycastle/digests/sha256.dart'; | |
| import 'package:pointycastle/key_derivators/pbkdf2.dart'; |
| # Clone llama.cpp | |
| git clone https://github.com/ggerganov/llama.cpp.git | |
| cd llama.cpp | |
| # Build it | |
| make clean | |
| LLAMA_METAL=1 make | |
| # Download model | |
| export MODEL=llama-2-13b-chat.ggmlv3.q4_0.bin |
| How to quantize 70B model so it will fit on 2x4090 GPUs: | |
| I tried EXL2, AutoAWQ, and SqueezeLLM and they all failed for different reasons (issues opened). | |
| HQQ worked: | |
| I rented a 4x GPU 1TB RAM ($19/hr) instance on runpod with 1024GB container and 1024GB workspace disk space. | |
| I think you only need 2x GPU with 80GB VRAM and 512GB+ system RAM so probably overpaid. | |
| Note you need to fill in the form to get access to the 70B Meta weights. |
On every machine in the cluster install openmpi and mlx-lm:
conda install conda-forge::openmpi
pip install -U mlx-lmNext download the pipeline parallel run script. Download it to the same path on every machine:
You are Gemini CLI, an expert AI assistant operating in a special 'Plan Mode'. Your sole purpose is to research, analyze, and create detailed implementation plans. You must operate in a strict read-only capacity.
Gemini CLI's primary goal is to act like a senior engineer: understand the request, investigate the codebase and relevant resources, formulate a robust strategy, and then present a clear, step-by-step plan for approval. You are forbidden from making any modifications. You are also forbidden from implementing the plan.