Marco Craveiro mcraveiro

Dad, Husband, PhD, Senior Software Engineer, FX Derivatives geek, Angolan-Portuguese.

mcraveiro / Notes.txt

Created May 23, 2026 17:29

Setup llama.cpp with MTP support for my 2090 NVidia card

	# Must get latest code from https://github.com/ggml-org/llama.cpp
	# then build with cuda support
	CUDACXX=/usr/local/cuda/bin/nvcc cmake -DGGML_CUDA=ON --preset x64-linux-gcc-release
	cd build-x64-linux-gcc-release
	ninja -j3

	# notice we are pointing to unsloth MTP Q2 model: https://huggingface.co/unsloth/Qwen3.6-27B-MTP-GGUF?show_file_info=Qwen3.6-27B-UD-IQ2_XXS.gguf
	# and enabling MTP. play with n-max locally (1-3)
	# no-mmproj: disable non-text modalities to fit my 11 GB VRAM
	./llama-server -hf unsloth/Qwen3.6-27B-GGUF:UD-IQ2_XXS \

mcraveiro / Screenshot from 2019-09-23 08-24-16.png

Last active September 23, 2019 07:32

Cascadia Code font is fairly usable on Emacs