llama.cpp SOFT_MAX failed: invalid argument on Blackwell consumer (sm_120) — root cause + 12-line fix
If you searched for this error, you're in the right place:
ggml_cuda_compute_forward: SOFT_MAX failed
CUDA error: invalid argument
current device: 0, in function ggml_cuda_compute_forward at ggml/src/ggml-cuda/ggml-cuda.cu:2962