AMD’s Strix Halo APU is one of the most interesting consumer options right now for running local AI workloads. It is relatively power efficient, and its unified memory architecture lets the GPU access large amounts of system RAM. This makes it possible to run larger models locally without a discrete GPU.
As the software is still maturing, running models with GPU acceleration is not exactly straightforward yet. So I wrote this guide after getting a setup working