https://github.com/ggerganov/llama.cpp/compare/master...ochafik:llama.cpp:model-args?expand=1
Fixes ggml-org/llama.cpp#6887
-
--modelis now inferred from--model-url/-muor--hf-file/-hffif set (it still defaults tomodels/7B/gguf-model-f16.ggufotherwise). Downloading different URLs will no longer overwrite previous downloads. -
URL model download now write a
.jsoncompanion metadata file (instead of the previous separate.etag&.lastModifiedfiles). This also contains the URL itself, which is useful to remember the exact origin of models & prevents accidental overwrites of files. -
Log about etag / modified time changes that cause re-downloads
-
Incidentally, enable the defaulting of
--hf-fileto--modelonserver(as was done onmain)
make clean && make -j LLAMA_CURL=1 main server
./main -p Test -n 100 -mu https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-q4.gguf
./main -p Test -n 100 -hfr NousResearch/Meta-Llama-3-8B-Instruct-GGUF -hff Meta-Llama-3-8B-Instruct-Q4_K_M.gguf
ls models/
# Meta-Llama-3-8B-Instruct-Q4_K_M.gguf
# Meta-Llama-3-8B-Instruct-Q4_K_M.gguf.json
# Phi-3-mini-4k-instruct-q4.gguf
# Phi-3-mini-4k-instruct-q4.gguf.json
cat models/Phi-3-mini-4k-instruct-q4.gguf.json
# {
# "url": "https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-q4.gguf"
# "etag": "\"b83ce18f1e735d825aa3402db6dae311-145\"",
# "lastModified": "Thu, 25 Apr 2024 21:26:15 GMT",
# }