Running a 26B parameter model with only 6GB RAM using mmap
This guide covers running Gemma 4 26B MOE (Mixture of Experts) locally on an Apple Silicon Mac using llama.cpp with memory-mapped files. The MOE architecture activates only 8 of 128 experts per token, making it incredibly efficient.
| Spec | Value |
|---|