A practical, end-to-end guide: prerequisites → build (ROCm + Vulkan) → run → quantize → wire into the pi-mono coding agent.
Hardware context: AMD Ryzen AI Max+ 395 ("Strix Halo") with integrated Radeon 8060S (gfx1151, RDNA 3.5, 40 CUs). 96 GB allocated as VRAM/GTT out of 128 GB unified memory. This is an APU with a unified memory architecture (UMA) — CPU and GPU share the same physical RAM, which fundamentally changes how you should think about "VRAM" and
--no-mmap.