I wanted to know whether Gemma 4 could replace a cloud model for my day-to-day agentic coding. Not in theory, in practice. I use Codex CLI every day, running GPT-5.4 as my default model. It works well, but every token costs money and every prompt sends my code to someone else's server. I also have friends thinking seriously about spending real money on local setups, and so far I had not been convinced that would be useful for this kind of work. I was open to being wrong. Gemma 4 promised local tool calling that works. I spent a day finding out whether that held up once Codex CLI started reading files, writing patches and running tests.
I set up two machines. A 24 GB M4 Pro MacBook Pro, the laptop I carry everywhere, running the 26B MoE variant via llama.cpp in Q4_K_M because that was the highest practical fit in memory. And a Dell Pro Max GB10, 128 GB of unified memory on an NVIDIA Blackwell chip, running the 31B Dens