This llama-server setup is specifically tuned to my AMD Radeon RX 7900 XTX for running gemma 4 26B A4B quantized by unsloth.
I've set it up to ensure it's stable, preferring as much practical quality as possible despite the VRAM limits.
This utilizes 99% of the VRAM on my setup so there's no room for improvement.
I get somewhere between 100-120 tokens/second token generation speeds with a single user.