Skip to content

Instantly share code, notes, and snippets.

@pszemraj
Created November 5, 2025 00:40
Show Gist options
  • Save pszemraj/ddffcf781a306da274643cb4377dc650 to your computer and use it in GitHub Desktop.
Save pszemraj/ddffcf781a306da274643cb4377dc650 to your computer and use it in GitHub Desktop.
issue with llama.cpp server (multimodal) lfm2-vl

Llama.cpp Multimodal Crash (general issue write-up)

Tue 04 Nov 2025 07:37:48 PM EST, commit a5c07dc

description

  • Symptom: llama-server exits with GGML_ASSERT(!slot.prompt.tokens.has_mtmd) inside server_context::update_slots() after a few multimodal requests.
  • Repro: launch any vision-capable GGUF (e.g. -VL-) with default slot reuse (--slot-prompt-similarity 0.10), hit /v1/chat/completions twice using OpenAI-format payloads that include image_url parts (base64 data URIs). The second call often reuses a slot whose has_mtmd flag is still set, triggering the assert and a core dump.
  • Flags already tried: disabling similarity (--slot-prompt-similarity 0.0), restoring checkpoints (--ctx-checkpoints 8), toggling continuous batching. Crash still occurs on current master.
  • Logs: daemon sees "connection closed before message completed," server backtrace ends in ggml_abort server_context::update_slots.
  • This should give you enough keywords to search GitHub issues or file a new one against llama.cpp.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment