install llama.cpp: go to https://github.com/ggml-org/llama.cpp and follow the instructions of your platform. Alternatively you can use other inference engines of your choice.
install llama-swap: https://github.com/mostlygeek/llama-swap and follow the instruction
Best model which fits in 12GB VRAM to date is https://huggingface.co/prithivMLmods/Ophiuchi-Qwen3-14B-Instruct choose a quantization which fits in the VRAM and still has enough room fo the context. Nanabrowser uses a lot of tokens (>10K).
ttl: 300