Llama.cpp Multimodal Crash (general issue write-up)

Tue 04 Nov 2025 07:37:48 PM EST, commit a5c07dc

description

Symptom: llama-server exits with GGML_ASSERT(!slot.prompt.tokens.has_mtmd) inside server_context::update_slots() after a few multimodal requests.
Repro: launch any vision-capable GGUF (e.g. -VL-) with default slot reuse (--slot-prompt-similarity 0.10), hit /v1/chat/completions twice using OpenAI-format payloads that include image_url parts (base64 data URIs). The second call often reuses a slot whose has_mtmd flag is still set, triggering the assert and a core dump.
Flags already tried: disabling similarity (--slot-prompt-similarity 0.0), restoring checkpoints (--ctx-checkpoints 8), toggling continuous batching. Crash still occurs on current master.
Logs: daemon sees "connection closed before message completed," server backtrace ends in ggml_abort server_context::update_slots.
This should give you enough keywords to search GitHub issues or file a new one against llama.cpp.