This is the small llama-server harness I used to compare MoE CPU-offload
behavior across baseline and patched llama.cpp builds.
The benchmark is intentionally narrow: it measures sequential chat-completion latency and token throughput while exercising the CPU-MoE host-to-device expert staging path. It is not a model-quality eval, and the prompt strings are synthetic systems-engineering workload data used only to reproduce timing.