Created
June 24, 2025 16:27
-
-
Save jaypeche/1b3cea80bc4dc7eb82bd0ecbc0907f4f to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
juin 24 17:53:31 strix systemd[1]: Started Ollama Service. | |
juin 24 17:53:31 strix ollama[15137]: time=2025-06-24T17:53:31.169+02:00 level=INFO source=routes.go:1234 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/var/lib/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" | |
juin 24 17:53:31 strix ollama[15137]: time=2025-06-24T17:53:31.171+02:00 level=INFO source=images.go:479 msg="total blobs: 17" | |
juin 24 17:53:31 strix ollama[15137]: time=2025-06-24T17:53:31.171+02:00 level=INFO source=images.go:486 msg="total unused blobs removed: 0" | |
juin 24 17:53:31 strix ollama[15137]: [GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached. | |
juin 24 17:53:31 strix ollama[15137]: [GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production. | |
juin 24 17:53:31 strix ollama[15137]: - using env: export GIN_MODE=release | |
juin 24 17:53:31 strix ollama[15137]: - using code: gin.SetMode(gin.ReleaseMode) | |
juin 24 17:53:31 strix ollama[15137]: [GIN-debug] HEAD / --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers) | |
juin 24 17:53:31 strix ollama[15137]: [GIN-debug] GET / --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers) | |
juin 24 17:53:31 strix ollama[15137]: [GIN-debug] HEAD /api/version --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers) | |
juin 24 17:53:31 strix ollama[15137]: [GIN-debug] GET /api/version --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func4 (5 handlers) | |
juin 24 17:53:31 strix ollama[15137]: [GIN-debug] POST /api/pull --> github.com/ollama/ollama/server.(*Server).PullHandler-fm (5 handlers) | |
juin 24 17:53:31 strix ollama[15137]: [GIN-debug] POST /api/push --> github.com/ollama/ollama/server.(*Server).PushHandler-fm (5 handlers) | |
juin 24 17:53:31 strix ollama[15137]: [GIN-debug] HEAD /api/tags --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers) | |
juin 24 17:53:31 strix ollama[15137]: [GIN-debug] GET /api/tags --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers) | |
juin 24 17:53:31 strix ollama[15137]: [GIN-debug] POST /api/show --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (5 handlers) | |
juin 24 17:53:31 strix ollama[15137]: [GIN-debug] DELETE /api/delete --> github.com/ollama/ollama/server.(*Server).DeleteHandler-fm (5 handlers) | |
juin 24 17:53:31 strix ollama[15137]: [GIN-debug] POST /api/create --> github.com/ollama/ollama/server.(*Server).CreateHandler-fm (5 handlers) | |
juin 24 17:53:31 strix ollama[15137]: [GIN-debug] POST /api/blobs/:digest --> github.com/ollama/ollama/server.(*Server).CreateBlobHandler-fm (5 handlers) | |
juin 24 17:53:31 strix ollama[15137]: [GIN-debug] HEAD /api/blobs/:digest --> github.com/ollama/ollama/server.(*Server).HeadBlobHandler-fm (5 handlers) | |
juin 24 17:53:31 strix ollama[15137]: [GIN-debug] POST /api/copy --> github.com/ollama/ollama/server.(*Server).CopyHandler-fm (5 handlers) | |
juin 24 17:53:31 strix ollama[15137]: [GIN-debug] GET /api/ps --> github.com/ollama/ollama/server.(*Server).PsHandler-fm (5 handlers) | |
juin 24 17:53:31 strix ollama[15137]: [GIN-debug] POST /api/generate --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (5 handlers) | |
juin 24 17:53:31 strix ollama[15137]: [GIN-debug] POST /api/chat --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (5 handlers) | |
juin 24 17:53:31 strix ollama[15137]: [GIN-debug] POST /api/embed --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (5 handlers) | |
juin 24 17:53:31 strix ollama[15137]: [GIN-debug] POST /api/embeddings --> github.com/ollama/ollama/server.(*Server).EmbeddingsHandler-fm (5 handlers) | |
juin 24 17:53:31 strix ollama[15137]: [GIN-debug] POST /v1/chat/completions --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers) | |
juin 24 17:53:31 strix ollama[15137]: [GIN-debug] POST /v1/completions --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (6 handlers) | |
juin 24 17:53:31 strix ollama[15137]: [GIN-debug] POST /v1/embeddings --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (6 handlers) | |
juin 24 17:53:31 strix ollama[15137]: [GIN-debug] GET /v1/models --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (6 handlers) | |
juin 24 17:53:31 strix ollama[15137]: [GIN-debug] GET /v1/models/:model --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (6 handlers) | |
juin 24 17:53:31 strix ollama[15137]: time=2025-06-24T17:53:31.171+02:00 level=INFO source=routes.go:1287 msg="Listening on 127.0.0.1:11434 (version 0.9.0)" | |
juin 24 17:53:31 strix ollama[15137]: time=2025-06-24T17:53:31.171+02:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" | |
juin 24 17:53:31 strix ollama[15137]: time=2025-06-24T17:53:31.436+02:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-7b5f2078-943e-eab2-e6d3-5b7492dcdb7e library=cuda variant=v12 compute=8.9 driver=12.9 name="NVIDIA GeForce RTX 4060 Laptop GPU" total="7.6 GiB" available="7.2 GiB" | |
juin 24 17:53:35 strix ollama[15137]: [GIN] 2025/06/24 - 17:53:35 | 200 | 71.556µs | 127.0.0.1 | GET "/api/version" | |
juin 24 17:54:20 strix ollama[15137]: [GIN] 2025/06/24 - 17:54:20 | 200 | 696.258µs | 127.0.0.1 | GET "/api/tags" | |
juin 24 17:54:20 strix ollama[15137]: [GIN] 2025/06/24 - 17:54:20 | 200 | 318.985µs | 127.0.0.1 | GET "/api/tags" | |
juin 24 17:54:20 strix ollama[15137]: [GIN] 2025/06/24 - 17:54:20 | 200 | 28.456298ms | 127.0.0.1 | POST "/api/show" | |
juin 24 17:54:20 strix ollama[15137]: [GIN] 2025/06/24 - 17:54:20 | 200 | 30.083599ms | 127.0.0.1 | POST "/api/show" | |
juin 24 17:54:20 strix ollama[15137]: [GIN] 2025/06/24 - 17:54:20 | 200 | 30.319265ms | 127.0.0.1 | POST "/api/show" | |
juin 24 17:54:20 strix ollama[15137]: [GIN] 2025/06/24 - 17:54:20 | 200 | 33.496928ms | 127.0.0.1 | POST "/api/show" | |
juin 24 17:54:20 strix ollama[15137]: [GIN] 2025/06/24 - 17:54:20 | 200 | 25.578983ms | 127.0.0.1 | POST "/api/show" | |
juin 24 17:54:20 strix ollama[15137]: [GIN] 2025/06/24 - 17:54:20 | 200 | 30.963229ms | 127.0.0.1 | POST "/api/show" | |
juin 24 17:54:20 strix ollama[15137]: [GIN] 2025/06/24 - 17:54:20 | 200 | 33.550369ms | 127.0.0.1 | POST "/api/show" | |
juin 24 17:54:20 strix ollama[15137]: [GIN] 2025/06/24 - 17:54:20 | 200 | 35.769237ms | 127.0.0.1 | POST "/api/show" | |
juin 24 18:16:21 strix ollama[15137]: [GIN] 2025/06/24 - 18:16:21 | 200 | 26.347µs | 127.0.0.1 | HEAD "/" | |
juin 24 18:16:21 strix ollama[15137]: [GIN] 2025/06/24 - 18:16:21 | 200 | 120.637µs | 127.0.0.1 | GET "/api/ps" | |
juin 24 18:16:53 strix ollama[15137]: time=2025-06-24T18:16:53.346+02:00 level=INFO source=sched.go:788 msg="new model will fit in available VRAM in single GPU, loading" model=/var/lib/ollama/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 gpu=GPU-7b5f2078-943e-eab2-e6d3-5b7492dcdb7e parallel=2 available=7574585344 required="6.5 GiB" | |
juin 24 18:16:53 strix ollama[15137]: time=2025-06-24T18:16:53.447+02:00 level=INFO source=server.go:135 msg="system memory" total="31.0 GiB" free="26.3 GiB" free_swap="7.8 GiB" | |
juin 24 18:16:53 strix ollama[15137]: time=2025-06-24T18:16:53.447+02:00 level=INFO source=server.go:168 msg=offload library=cuda layers.requested=-1 layers.model=29 layers.offload=29 layers.split="" memory.available="[7.1 GiB]" memory.gpu_overhead="0 B" memory.required.full="6.5 GiB" memory.required.partial="6.5 GiB" memory.required.kv="896.0 MiB" memory.required.allocations="[6.5 GiB]" memory.weights.total="4.1 GiB" memory.weights.repeating="3.7 GiB" memory.weights.nonrepeating="426.4 MiB" memory.graph.full="942.0 MiB" memory.graph.partial="1.1 GiB" | |
juin 24 18:16:53 strix ollama[15137]: llama_model_loader: loaded meta data with 34 key-value pairs and 339 tensors from /var/lib/ollama/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 (version GGUF V3 (latest)) | |
juin 24 18:16:53 strix ollama[15137]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. | |
juin 24 18:16:53 strix ollama[15137]: llama_model_loader: - kv 0: general.architecture str = qwen2 | |
juin 24 18:16:53 strix ollama[15137]: llama_model_loader: - kv 1: general.type str = model | |
juin 24 18:16:53 strix ollama[15137]: llama_model_loader: - kv 2: general.name str = Qwen2.5 Coder 7B Instruct | |
juin 24 18:16:53 strix ollama[15137]: llama_model_loader: - kv 3: general.finetune str = Instruct | |
juin 24 18:16:53 strix ollama[15137]: llama_model_loader: - kv 4: general.basename str = Qwen2.5-Coder | |
juin 24 18:16:53 strix ollama[15137]: llama_model_loader: - kv 5: general.size_label str = 7B | |
juin 24 18:16:53 strix ollama[15137]: llama_model_loader: - kv 6: general.license str = apache-2.0 | |
juin 24 18:16:53 strix ollama[15137]: llama_model_loader: - kv 7: general.license.link str = https://huggingface.co/Qwen/Qwen2.5-C... | |
juin 24 18:16:53 strix ollama[15137]: llama_model_loader: - kv 8: general.base_model.count u32 = 1 | |
juin 24 18:16:53 strix ollama[15137]: llama_model_loader: - kv 9: general.base_model.0.name str = Qwen2.5 Coder 7B | |
juin 24 18:16:53 strix ollama[15137]: llama_model_loader: - kv 10: general.base_model.0.organization str = Qwen | |
juin 24 18:16:53 strix ollama[15137]: llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2.5-C... | |
juin 24 18:16:53 strix ollama[15137]: llama_model_loader: - kv 12: general.tags arr[str,6] = ["code", "codeqwen", "chat", "qwen", ... | |
juin 24 18:16:53 strix ollama[15137]: llama_model_loader: - kv 13: general.languages arr[str,1] = ["en"] | |
juin 24 18:16:53 strix ollama[15137]: llama_model_loader: - kv 14: qwen2.block_count u32 = 28 | |
juin 24 18:16:53 strix ollama[15137]: llama_model_loader: - kv 15: qwen2.context_length u32 = 32768 | |
juin 24 18:16:53 strix ollama[15137]: llama_model_loader: - kv 16: qwen2.embedding_length u32 = 3584 | |
juin 24 18:16:53 strix ollama[15137]: llama_model_loader: - kv 17: qwen2.feed_forward_length u32 = 18944 | |
juin 24 18:16:53 strix ollama[15137]: llama_model_loader: - kv 18: qwen2.attention.head_count u32 = 28 | |
juin 24 18:16:53 strix ollama[15137]: llama_model_loader: - kv 19: qwen2.attention.head_count_kv u32 = 4 | |
juin 24 18:16:53 strix ollama[15137]: llama_model_loader: - kv 20: qwen2.rope.freq_base f32 = 1000000.000000 | |
juin 24 18:16:53 strix ollama[15137]: llama_model_loader: - kv 21: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001 | |
juin 24 18:16:53 strix ollama[15137]: llama_model_loader: - kv 22: general.file_type u32 = 15 | |
juin 24 18:16:53 strix ollama[15137]: llama_model_loader: - kv 23: tokenizer.ggml.model str = gpt2 | |
juin 24 18:16:53 strix ollama[15137]: llama_model_loader: - kv 24: tokenizer.ggml.pre str = qwen2 | |
juin 24 18:16:53 strix ollama[15137]: llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,152064] = ["!", "\"", "#", "$", "%", "&", "'", ... | |
juin 24 18:16:53 strix ollama[15137]: llama_model_loader: - kv 26: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... | |
juin 24 18:16:53 strix ollama[15137]: llama_model_loader: - kv 27: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... | |
juin 24 18:16:53 strix ollama[15137]: llama_model_loader: - kv 28: tokenizer.ggml.eos_token_id u32 = 151645 | |
juin 24 18:16:53 strix ollama[15137]: llama_model_loader: - kv 29: tokenizer.ggml.padding_token_id u32 = 151643 | |
juin 24 18:16:53 strix ollama[15137]: llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 151643 | |
juin 24 18:16:53 strix ollama[15137]: llama_model_loader: - kv 31: tokenizer.ggml.add_bos_token bool = false | |
juin 24 18:16:53 strix ollama[15137]: llama_model_loader: - kv 32: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... | |
juin 24 18:16:53 strix ollama[15137]: llama_model_loader: - kv 33: general.quantization_version u32 = 2 | |
juin 24 18:16:53 strix ollama[15137]: llama_model_loader: - type f32: 141 tensors | |
juin 24 18:16:53 strix ollama[15137]: llama_model_loader: - type q4_K: 169 tensors | |
juin 24 18:16:53 strix ollama[15137]: llama_model_loader: - type q6_K: 29 tensors | |
juin 24 18:16:53 strix ollama[15137]: print_info: file format = GGUF V3 (latest) | |
juin 24 18:16:53 strix ollama[15137]: print_info: file type = Q4_K - Medium | |
juin 24 18:16:53 strix ollama[15137]: print_info: file size = 4.36 GiB (4.91 BPW) | |
juin 24 18:16:53 strix ollama[15137]: load: special tokens cache size = 22 | |
juin 24 18:16:53 strix ollama[15137]: load: token to piece cache size = 0.9310 MB | |
juin 24 18:16:53 strix ollama[15137]: print_info: arch = qwen2 | |
juin 24 18:16:53 strix ollama[15137]: print_info: vocab_only = 1 | |
juin 24 18:16:53 strix ollama[15137]: print_info: model type = ?B | |
juin 24 18:16:53 strix ollama[15137]: print_info: model params = 7.62 B | |
juin 24 18:16:53 strix ollama[15137]: print_info: general.name = Qwen2.5 Coder 7B Instruct | |
juin 24 18:16:53 strix ollama[15137]: print_info: vocab type = BPE | |
juin 24 18:16:53 strix ollama[15137]: print_info: n_vocab = 152064 | |
juin 24 18:16:53 strix ollama[15137]: print_info: n_merges = 151387 | |
juin 24 18:16:53 strix ollama[15137]: print_info: BOS token = 151643 '<|endoftext|>' | |
juin 24 18:16:53 strix ollama[15137]: print_info: EOS token = 151645 '<|im_end|>' | |
juin 24 18:16:53 strix ollama[15137]: print_info: EOT token = 151645 '<|im_end|>' | |
juin 24 18:16:53 strix ollama[15137]: print_info: PAD token = 151643 '<|endoftext|>' | |
juin 24 18:16:53 strix ollama[15137]: print_info: LF token = 198 'Ċ' | |
juin 24 18:16:53 strix ollama[15137]: print_info: FIM PRE token = 151659 '<|fim_prefix|>' | |
juin 24 18:16:53 strix ollama[15137]: print_info: FIM SUF token = 151661 '<|fim_suffix|>' | |
juin 24 18:16:53 strix ollama[15137]: print_info: FIM MID token = 151660 '<|fim_middle|>' | |
juin 24 18:16:53 strix ollama[15137]: print_info: FIM PAD token = 151662 '<|fim_pad|>' | |
juin 24 18:16:53 strix ollama[15137]: print_info: FIM REP token = 151663 '<|repo_name|>' | |
juin 24 18:16:53 strix ollama[15137]: print_info: FIM SEP token = 151664 '<|file_sep|>' | |
juin 24 18:16:53 strix ollama[15137]: print_info: EOG token = 151643 '<|endoftext|>' | |
juin 24 18:16:53 strix ollama[15137]: print_info: EOG token = 151645 '<|im_end|>' | |
juin 24 18:16:53 strix ollama[15137]: print_info: EOG token = 151662 '<|fim_pad|>' | |
juin 24 18:16:53 strix ollama[15137]: print_info: EOG token = 151663 '<|repo_name|>' | |
juin 24 18:16:53 strix ollama[15137]: print_info: EOG token = 151664 '<|file_sep|>' | |
juin 24 18:16:53 strix ollama[15137]: print_info: max token length = 256 | |
juin 24 18:16:53 strix ollama[15137]: llama_model_load: vocab only - skipping tensors | |
juin 24 18:16:53 strix ollama[15137]: time=2025-06-24T18:16:53.746+02:00 level=INFO source=server.go:431 msg="starting llama server" cmd="/usr/bin/ollama runner --model /var/lib/ollama/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 --ctx-size 16384 --batch-size 512 --n-gpu-layers 29 --threads 6 --parallel 2 --port 40537" | |
juin 24 18:16:53 strix ollama[15137]: time=2025-06-24T18:16:53.747+02:00 level=INFO source=sched.go:483 msg="loaded runners" count=1 | |
juin 24 18:16:53 strix ollama[15137]: time=2025-06-24T18:16:53.747+02:00 level=INFO source=server.go:591 msg="waiting for llama runner to start responding" | |
juin 24 18:16:53 strix ollama[15137]: time=2025-06-24T18:16:53.747+02:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server not responding" | |
juin 24 18:16:53 strix ollama[15137]: time=2025-06-24T18:16:53.754+02:00 level=INFO source=runner.go:815 msg="starting go runner" | |
juin 24 18:16:53 strix ollama[15137]: load_backend: loaded CPU backend from /usr/lib64/ollama/libggml-cpu-x64.so | |
juin 24 18:16:54 strix ollama[15137]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no | |
juin 24 18:16:54 strix ollama[15137]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no | |
juin 24 18:16:54 strix ollama[15137]: ggml_cuda_init: found 1 CUDA devices: | |
juin 24 18:16:54 strix ollama[15137]: Device 0: NVIDIA GeForce RTX 4060 Laptop GPU, compute capability 8.9, VMM: yes | |
juin 24 18:16:54 strix ollama[15137]: load_backend: loaded CUDA backend from /usr/lib64/ollama/cuda_v12/libggml-cuda.so | |
juin 24 18:16:54 strix ollama[15137]: time=2025-06-24T18:16:54.859+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.SSE3=1 CPU.1.SSSE3=1 CPU.1.AVX=1 CPU.1.AVX_VNNI=1 CPU.1.AVX2=1 CPU.1.F16C=1 CPU.1.FMA=1 CPU.1.BMI2=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=520 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) | |
juin 24 18:16:54 strix ollama[15137]: time=2025-06-24T18:16:54.859+02:00 level=INFO source=runner.go:874 msg="Server listening on 127.0.0.1:40537" | |
juin 24 18:16:54 strix ollama[15137]: llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 4060 Laptop GPU) - 7224 MiB free | |
juin 24 18:16:54 strix ollama[15137]: llama_model_loader: loaded meta data with 34 key-value pairs and 339 tensors from /var/lib/ollama/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 (version GGUF V3 (latest)) | |
juin 24 18:16:54 strix ollama[15137]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. | |
juin 24 18:16:54 strix ollama[15137]: llama_model_loader: - kv 0: general.architecture str = qwen2 | |
juin 24 18:16:54 strix ollama[15137]: llama_model_loader: - kv 1: general.type str = model | |
juin 24 18:16:54 strix ollama[15137]: llama_model_loader: - kv 2: general.name str = Qwen2.5 Coder 7B Instruct | |
juin 24 18:16:54 strix ollama[15137]: llama_model_loader: - kv 3: general.finetune str = Instruct | |
juin 24 18:16:54 strix ollama[15137]: llama_model_loader: - kv 4: general.basename str = Qwen2.5-Coder | |
juin 24 18:16:54 strix ollama[15137]: llama_model_loader: - kv 5: general.size_label str = 7B | |
juin 24 18:16:54 strix ollama[15137]: llama_model_loader: - kv 6: general.license str = apache-2.0 | |
juin 24 18:16:54 strix ollama[15137]: llama_model_loader: - kv 7: general.license.link str = https://huggingface.co/Qwen/Qwen2.5-C... | |
juin 24 18:16:54 strix ollama[15137]: llama_model_loader: - kv 8: general.base_model.count u32 = 1 | |
juin 24 18:16:54 strix ollama[15137]: llama_model_loader: - kv 9: general.base_model.0.name str = Qwen2.5 Coder 7B | |
juin 24 18:16:54 strix ollama[15137]: llama_model_loader: - kv 10: general.base_model.0.organization str = Qwen | |
juin 24 18:16:54 strix ollama[15137]: llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2.5-C... | |
juin 24 18:16:54 strix ollama[15137]: llama_model_loader: - kv 12: general.tags arr[str,6] = ["code", "codeqwen", "chat", "qwen", ... | |
juin 24 18:16:54 strix ollama[15137]: llama_model_loader: - kv 13: general.languages arr[str,1] = ["en"] | |
juin 24 18:16:54 strix ollama[15137]: llama_model_loader: - kv 14: qwen2.block_count u32 = 28 | |
juin 24 18:16:54 strix ollama[15137]: llama_model_loader: - kv 15: qwen2.context_length u32 = 32768 | |
juin 24 18:16:54 strix ollama[15137]: llama_model_loader: - kv 16: qwen2.embedding_length u32 = 3584 | |
juin 24 18:16:54 strix ollama[15137]: llama_model_loader: - kv 17: qwen2.feed_forward_length u32 = 18944 | |
juin 24 18:16:54 strix ollama[15137]: llama_model_loader: - kv 18: qwen2.attention.head_count u32 = 28 | |
juin 24 18:16:54 strix ollama[15137]: llama_model_loader: - kv 19: qwen2.attention.head_count_kv u32 = 4 | |
juin 24 18:16:54 strix ollama[15137]: llama_model_loader: - kv 20: qwen2.rope.freq_base f32 = 1000000.000000 | |
juin 24 18:16:54 strix ollama[15137]: llama_model_loader: - kv 21: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001 | |
juin 24 18:16:54 strix ollama[15137]: llama_model_loader: - kv 22: general.file_type u32 = 15 | |
juin 24 18:16:54 strix ollama[15137]: llama_model_loader: - kv 23: tokenizer.ggml.model str = gpt2 | |
juin 24 18:16:54 strix ollama[15137]: llama_model_loader: - kv 24: tokenizer.ggml.pre str = qwen2 | |
juin 24 18:16:54 strix ollama[15137]: llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,152064] = ["!", "\"", "#", "$", "%", "&", "'", ... | |
juin 24 18:16:54 strix ollama[15137]: llama_model_loader: - kv 26: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... | |
juin 24 18:16:54 strix ollama[15137]: llama_model_loader: - kv 27: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... | |
juin 24 18:16:54 strix ollama[15137]: llama_model_loader: - kv 28: tokenizer.ggml.eos_token_id u32 = 151645 | |
juin 24 18:16:54 strix ollama[15137]: llama_model_loader: - kv 29: tokenizer.ggml.padding_token_id u32 = 151643 | |
juin 24 18:16:54 strix ollama[15137]: llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 151643 | |
juin 24 18:16:54 strix ollama[15137]: llama_model_loader: - kv 31: tokenizer.ggml.add_bos_token bool = false | |
juin 24 18:16:54 strix ollama[15137]: llama_model_loader: - kv 32: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... | |
juin 24 18:16:54 strix ollama[15137]: llama_model_loader: - kv 33: general.quantization_version u32 = 2 | |
juin 24 18:16:54 strix ollama[15137]: llama_model_loader: - type f32: 141 tensors | |
juin 24 18:16:54 strix ollama[15137]: llama_model_loader: - type q4_K: 169 tensors | |
juin 24 18:16:54 strix ollama[15137]: llama_model_loader: - type q6_K: 29 tensors | |
juin 24 18:16:54 strix ollama[15137]: print_info: file format = GGUF V3 (latest) | |
juin 24 18:16:54 strix ollama[15137]: print_info: file type = Q4_K - Medium | |
juin 24 18:16:54 strix ollama[15137]: print_info: file size = 4.36 GiB (4.91 BPW) | |
juin 24 18:16:55 strix ollama[15137]: time=2025-06-24T18:16:55.002+02:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server loading model" | |
juin 24 18:16:55 strix ollama[15137]: load: special tokens cache size = 22 | |
juin 24 18:16:55 strix ollama[15137]: load: token to piece cache size = 0.9310 MB | |
juin 24 18:16:55 strix ollama[15137]: print_info: arch = qwen2 | |
juin 24 18:16:55 strix ollama[15137]: print_info: vocab_only = 0 | |
juin 24 18:16:55 strix ollama[15137]: print_info: n_ctx_train = 32768 | |
juin 24 18:16:55 strix ollama[15137]: print_info: n_embd = 3584 | |
juin 24 18:16:55 strix ollama[15137]: print_info: n_layer = 28 | |
juin 24 18:16:55 strix ollama[15137]: print_info: n_head = 28 | |
juin 24 18:16:55 strix ollama[15137]: print_info: n_head_kv = 4 | |
juin 24 18:16:55 strix ollama[15137]: print_info: n_rot = 128 | |
juin 24 18:16:55 strix ollama[15137]: print_info: n_swa = 0 | |
juin 24 18:16:55 strix ollama[15137]: print_info: n_swa_pattern = 1 | |
juin 24 18:16:55 strix ollama[15137]: print_info: n_embd_head_k = 128 | |
juin 24 18:16:55 strix ollama[15137]: print_info: n_embd_head_v = 128 | |
juin 24 18:16:55 strix ollama[15137]: print_info: n_gqa = 7 | |
juin 24 18:16:55 strix ollama[15137]: print_info: n_embd_k_gqa = 512 | |
juin 24 18:16:55 strix ollama[15137]: print_info: n_embd_v_gqa = 512 | |
juin 24 18:16:55 strix ollama[15137]: print_info: f_norm_eps = 0.0e+00 | |
juin 24 18:16:55 strix ollama[15137]: print_info: f_norm_rms_eps = 1.0e-06 | |
juin 24 18:16:55 strix ollama[15137]: print_info: f_clamp_kqv = 0.0e+00 | |
juin 24 18:16:55 strix ollama[15137]: print_info: f_max_alibi_bias = 0.0e+00 | |
juin 24 18:16:55 strix ollama[15137]: print_info: f_logit_scale = 0.0e+00 | |
juin 24 18:16:55 strix ollama[15137]: print_info: f_attn_scale = 0.0e+00 | |
juin 24 18:16:55 strix ollama[15137]: print_info: n_ff = 18944 | |
juin 24 18:16:55 strix ollama[15137]: print_info: n_expert = 0 | |
juin 24 18:16:55 strix ollama[15137]: print_info: n_expert_used = 0 | |
juin 24 18:16:55 strix ollama[15137]: print_info: causal attn = 1 | |
juin 24 18:16:55 strix ollama[15137]: print_info: pooling type = -1 | |
juin 24 18:16:55 strix ollama[15137]: print_info: rope type = 2 | |
juin 24 18:16:55 strix ollama[15137]: print_info: rope scaling = linear | |
juin 24 18:16:55 strix ollama[15137]: print_info: freq_base_train = 1000000.0 | |
juin 24 18:16:55 strix ollama[15137]: print_info: freq_scale_train = 1 | |
juin 24 18:16:55 strix ollama[15137]: print_info: n_ctx_orig_yarn = 32768 | |
juin 24 18:16:55 strix ollama[15137]: print_info: rope_finetuned = unknown | |
juin 24 18:16:55 strix ollama[15137]: print_info: ssm_d_conv = 0 | |
juin 24 18:16:55 strix ollama[15137]: print_info: ssm_d_inner = 0 | |
juin 24 18:16:55 strix ollama[15137]: print_info: ssm_d_state = 0 | |
juin 24 18:16:55 strix ollama[15137]: print_info: ssm_dt_rank = 0 | |
juin 24 18:16:55 strix ollama[15137]: print_info: ssm_dt_b_c_rms = 0 | |
juin 24 18:16:55 strix ollama[15137]: print_info: model type = 7B | |
juin 24 18:16:55 strix ollama[15137]: print_info: model params = 7.62 B | |
juin 24 18:16:55 strix ollama[15137]: print_info: general.name = Qwen2.5 Coder 7B Instruct | |
juin 24 18:16:55 strix ollama[15137]: print_info: vocab type = BPE | |
juin 24 18:16:55 strix ollama[15137]: print_info: n_vocab = 152064 | |
juin 24 18:16:55 strix ollama[15137]: print_info: n_merges = 151387 | |
juin 24 18:16:55 strix ollama[15137]: print_info: BOS token = 151643 '<|endoftext|>' | |
juin 24 18:16:55 strix ollama[15137]: print_info: EOS token = 151645 '<|im_end|>' | |
juin 24 18:16:55 strix ollama[15137]: print_info: EOT token = 151645 '<|im_end|>' | |
juin 24 18:16:55 strix ollama[15137]: print_info: PAD token = 151643 '<|endoftext|>' | |
juin 24 18:16:55 strix ollama[15137]: print_info: LF token = 198 'Ċ' | |
juin 24 18:16:55 strix ollama[15137]: print_info: FIM PRE token = 151659 '<|fim_prefix|>' | |
juin 24 18:16:55 strix ollama[15137]: print_info: FIM SUF token = 151661 '<|fim_suffix|>' | |
juin 24 18:16:55 strix ollama[15137]: print_info: FIM MID token = 151660 '<|fim_middle|>' | |
juin 24 18:16:55 strix ollama[15137]: print_info: FIM PAD token = 151662 '<|fim_pad|>' | |
juin 24 18:16:55 strix ollama[15137]: print_info: FIM REP token = 151663 '<|repo_name|>' | |
juin 24 18:16:55 strix ollama[15137]: print_info: FIM SEP token = 151664 '<|file_sep|>' | |
juin 24 18:16:55 strix ollama[15137]: print_info: EOG token = 151643 '<|endoftext|>' | |
juin 24 18:16:55 strix ollama[15137]: print_info: EOG token = 151645 '<|im_end|>' | |
juin 24 18:16:55 strix ollama[15137]: print_info: EOG token = 151662 '<|fim_pad|>' | |
juin 24 18:16:55 strix ollama[15137]: print_info: EOG token = 151663 '<|repo_name|>' | |
juin 24 18:16:55 strix ollama[15137]: print_info: EOG token = 151664 '<|file_sep|>' | |
juin 24 18:16:55 strix ollama[15137]: print_info: max token length = 256 | |
juin 24 18:16:55 strix ollama[15137]: load_tensors: loading model tensors, this can take a while... (mmap = true) | |
juin 24 18:16:56 strix ollama[15137]: load_tensors: offloading 28 repeating layers to GPU | |
juin 24 18:16:56 strix ollama[15137]: load_tensors: offloading output layer to GPU | |
juin 24 18:16:56 strix ollama[15137]: load_tensors: offloaded 29/29 layers to GPU | |
juin 24 18:16:56 strix ollama[15137]: load_tensors: CUDA0 model buffer size = 4168.09 MiB | |
juin 24 18:16:56 strix ollama[15137]: load_tensors: CPU_Mapped model buffer size = 292.36 MiB | |
juin 24 18:16:57 strix ollama[15137]: llama_context: constructing llama_context | |
juin 24 18:16:57 strix ollama[15137]: llama_context: n_seq_max = 2 | |
juin 24 18:16:57 strix ollama[15137]: llama_context: n_ctx = 16384 | |
juin 24 18:16:57 strix ollama[15137]: llama_context: n_ctx_per_seq = 8192 | |
juin 24 18:16:57 strix ollama[15137]: llama_context: n_batch = 1024 | |
juin 24 18:16:57 strix ollama[15137]: llama_context: n_ubatch = 512 | |
juin 24 18:16:57 strix ollama[15137]: llama_context: causal_attn = 1 | |
juin 24 18:16:57 strix ollama[15137]: llama_context: flash_attn = 0 | |
juin 24 18:16:57 strix ollama[15137]: llama_context: freq_base = 1000000.0 | |
juin 24 18:16:57 strix ollama[15137]: llama_context: freq_scale = 1 | |
juin 24 18:16:57 strix ollama[15137]: llama_context: n_ctx_per_seq (8192) < n_ctx_train (32768) -- the full capacity of the model will not be utilized | |
juin 24 18:16:57 strix ollama[15137]: llama_context: CUDA_Host output buffer size = 1.19 MiB | |
juin 24 18:16:57 strix ollama[15137]: llama_kv_cache_unified: kv_size = 16384, type_k = 'f16', type_v = 'f16', n_layer = 28, can_shift = 1, padding = 32 | |
juin 24 18:16:57 strix ollama[15137]: llama_kv_cache_unified: CUDA0 KV buffer size = 896.00 MiB | |
juin 24 18:16:57 strix ollama[15137]: llama_kv_cache_unified: KV self size = 896.00 MiB, K (f16): 448.00 MiB, V (f16): 448.00 MiB | |
juin 24 18:16:57 strix ollama[15137]: llama_context: CUDA0 compute buffer size = 956.00 MiB | |
juin 24 18:16:57 strix ollama[15137]: llama_context: CUDA_Host compute buffer size = 39.01 MiB | |
juin 24 18:16:57 strix ollama[15137]: llama_context: graph nodes = 1042 | |
juin 24 18:16:57 strix ollama[15137]: llama_context: graph splits = 2 | |
juin 24 18:16:57 strix ollama[15137]: time=2025-06-24T18:16:57.513+02:00 level=INFO source=server.go:630 msg="llama runner started in 3.77 seconds" | |
juin 24 18:17:02 strix ollama[15137]: [GIN] 2025/06/24 - 18:17:02 | 200 | 9.672241643s | 127.0.0.1 | POST "/api/chat" | |
juin 24 18:17:04 strix ollama[15137]: [GIN] 2025/06/24 - 18:17:04 | 200 | 14.325µs | 127.0.0.1 | HEAD "/" | |
juin 24 18:17:04 strix ollama[15137]: [GIN] 2025/06/24 - 18:17:04 | 200 | 20.378µs | 127.0.0.1 | GET "/api/ps" | |
juin 24 18:24:55 strix ollama[15137]: [GIN] 2025/06/24 - 18:24:55 | 200 | 19.276µs | 127.0.0.1 | HEAD "/" | |
juin 24 18:24:55 strix ollama[15137]: [GIN] 2025/06/24 - 18:24:55 | 200 | 29.781µs | 127.0.0.1 | GET "/api/ps" |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment