Created
June 24, 2025 21:13
-
-
Save jaypeche/882b1711d6e3a25bd2951bb5d4dfffe6 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
juin 24 23:09:00 dbox2 systemd[1]: Started Ollama Service. | |
juin 24 23:09:00 dbox2 ollama[213509]: time=2025-06-24T23:09:00.473+02:00 level=INFO source=routes.go:1234 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/var/lib/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" | |
juin 24 23:09:00 dbox2 ollama[213509]: time=2025-06-24T23:09:00.477+02:00 level=INFO source=images.go:479 msg="total blobs: 16" | |
juin 24 23:09:00 dbox2 ollama[213509]: time=2025-06-24T23:09:00.478+02:00 level=INFO source=images.go:486 msg="total unused blobs removed: 0" | |
juin 24 23:09:00 dbox2 ollama[213509]: [GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached. | |
juin 24 23:09:00 dbox2 ollama[213509]: [GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production. | |
juin 24 23:09:00 dbox2 ollama[213509]: - using env: export GIN_MODE=release | |
juin 24 23:09:00 dbox2 ollama[213509]: - using code: gin.SetMode(gin.ReleaseMode) | |
juin 24 23:09:00 dbox2 ollama[213509]: [GIN-debug] HEAD / --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers) | |
juin 24 23:09:00 dbox2 ollama[213509]: [GIN-debug] GET / --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers) | |
juin 24 23:09:00 dbox2 ollama[213509]: [GIN-debug] HEAD /api/version --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers) | |
juin 24 23:09:00 dbox2 ollama[213509]: [GIN-debug] GET /api/version --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func4 (5 handlers) | |
juin 24 23:09:00 dbox2 ollama[213509]: [GIN-debug] POST /api/pull --> github.com/ollama/ollama/server.(*Server).PullHandler-fm (5 handlers) | |
juin 24 23:09:00 dbox2 ollama[213509]: [GIN-debug] POST /api/push --> github.com/ollama/ollama/server.(*Server).PushHandler-fm (5 handlers) | |
juin 24 23:09:00 dbox2 ollama[213509]: [GIN-debug] HEAD /api/tags --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers) | |
juin 24 23:09:00 dbox2 ollama[213509]: [GIN-debug] GET /api/tags --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers) | |
juin 24 23:09:00 dbox2 ollama[213509]: [GIN-debug] POST /api/show --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (5 handlers) | |
juin 24 23:09:00 dbox2 ollama[213509]: [GIN-debug] DELETE /api/delete --> github.com/ollama/ollama/server.(*Server).DeleteHandler-fm (5 handlers) | |
juin 24 23:09:00 dbox2 ollama[213509]: [GIN-debug] POST /api/create --> github.com/ollama/ollama/server.(*Server).CreateHandler-fm (5 handlers) | |
juin 24 23:09:00 dbox2 ollama[213509]: [GIN-debug] POST /api/blobs/:digest --> github.com/ollama/ollama/server.(*Server).CreateBlobHandler-fm (5 handlers) | |
juin 24 23:09:00 dbox2 ollama[213509]: [GIN-debug] HEAD /api/blobs/:digest --> github.com/ollama/ollama/server.(*Server).HeadBlobHandler-fm (5 handlers) | |
juin 24 23:09:00 dbox2 ollama[213509]: [GIN-debug] POST /api/copy --> github.com/ollama/ollama/server.(*Server).CopyHandler-fm (5 handlers) | |
juin 24 23:09:00 dbox2 ollama[213509]: [GIN-debug] GET /api/ps --> github.com/ollama/ollama/server.(*Server).PsHandler-fm (5 handlers) | |
juin 24 23:09:00 dbox2 ollama[213509]: [GIN-debug] POST /api/generate --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (5 handlers) | |
juin 24 23:09:00 dbox2 ollama[213509]: [GIN-debug] POST /api/chat --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (5 handlers) | |
juin 24 23:09:00 dbox2 ollama[213509]: [GIN-debug] POST /api/embed --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (5 handlers) | |
juin 24 23:09:00 dbox2 ollama[213509]: [GIN-debug] POST /api/embeddings --> github.com/ollama/ollama/server.(*Server).EmbeddingsHandler-fm (5 handlers) | |
juin 24 23:09:00 dbox2 ollama[213509]: [GIN-debug] POST /v1/chat/completions --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers) | |
juin 24 23:09:00 dbox2 ollama[213509]: [GIN-debug] POST /v1/completions --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (6 handlers) | |
juin 24 23:09:00 dbox2 ollama[213509]: [GIN-debug] POST /v1/embeddings --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (6 handlers) | |
juin 24 23:09:00 dbox2 ollama[213509]: [GIN-debug] GET /v1/models --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (6 handlers) | |
juin 24 23:09:00 dbox2 ollama[213509]: [GIN-debug] GET /v1/models/:model --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (6 handlers) | |
juin 24 23:09:00 dbox2 ollama[213509]: time=2025-06-24T23:09:00.480+02:00 level=INFO source=routes.go:1287 msg="Listening on 127.0.0.1:11434 (version 0.0.0)" | |
juin 24 23:09:00 dbox2 ollama[213509]: time=2025-06-24T23:09:00.480+02:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" | |
juin 24 23:09:00 dbox2 ollama[213509]: time=2025-06-24T23:09:00.827+02:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-aaccb9a7-ed1d-b91a-5770-605091ede72b library=cuda variant=v12 compute=5.0 driver=12.9 name="NVIDIA GeForce 920MX" total="1.9 GiB" available="1.9 GiB" | |
juin 24 23:09:06 dbox2 ollama[213509]: [GIN] 2025/06/24 - 23:09:06 | 200 | 134.194µs | 127.0.0.1 | HEAD "/" | |
juin 24 23:09:06 dbox2 ollama[213509]: [GIN] 2025/06/24 - 23:09:06 | 200 | 273.809µs | 127.0.0.1 | GET "/api/ps" | |
juin 24 23:10:12 dbox2 ollama[213509]: [GIN] 2025/06/24 - 23:10:12 | 200 | 34.807µs | 127.0.0.1 | HEAD "/" | |
juin 24 23:10:12 dbox2 ollama[213509]: [GIN] 2025/06/24 - 23:10:12 | 200 | 3.667782ms | 127.0.0.1 | GET "/api/tags" | |
juin 24 23:10:50 dbox2 ollama[213509]: [GIN] 2025/06/24 - 23:10:50 | 200 | 1.518786ms | 127.0.0.1 | GET "/api/tags" | |
juin 24 23:10:51 dbox2 ollama[213509]: [GIN] 2025/06/24 - 23:10:51 | 200 | 2.058446ms | 127.0.0.1 | GET "/api/tags" | |
juin 24 23:10:51 dbox2 ollama[213509]: [GIN] 2025/06/24 - 23:10:51 | 200 | 95.08694ms | 127.0.0.1 | POST "/api/show" | |
juin 24 23:10:51 dbox2 ollama[213509]: [GIN] 2025/06/24 - 23:10:51 | 200 | 107.854241ms | 127.0.0.1 | POST "/api/show" | |
juin 24 23:10:51 dbox2 ollama[213509]: [GIN] 2025/06/24 - 23:10:51 | 200 | 192.909776ms | 127.0.0.1 | POST "/api/show" | |
juin 24 23:10:51 dbox2 ollama[213509]: [GIN] 2025/06/24 - 23:10:51 | 200 | 205.766749ms | 127.0.0.1 | POST "/api/show" | |
juin 24 23:10:51 dbox2 ollama[213509]: [GIN] 2025/06/24 - 23:10:51 | 200 | 49.80928ms | 127.0.0.1 | POST "/api/show" | |
juin 24 23:10:51 dbox2 ollama[213509]: [GIN] 2025/06/24 - 23:10:51 | 200 | 73.386806ms | 127.0.0.1 | POST "/api/show" | |
juin 24 23:10:51 dbox2 ollama[213509]: [GIN] 2025/06/24 - 23:10:51 | 200 | 113.065893ms | 127.0.0.1 | POST "/api/show" | |
juin 24 23:10:51 dbox2 ollama[213509]: [GIN] 2025/06/24 - 23:10:51 | 200 | 149.997326ms | 127.0.0.1 | POST "/api/show" | |
juin 24 23:11:00 dbox2 ollama[213509]: [GIN] 2025/06/24 - 23:11:00 | 200 | 2.260427ms | 127.0.0.1 | GET "/api/tags" | |
juin 24 23:11:00 dbox2 ollama[213509]: [GIN] 2025/06/24 - 23:11:00 | 200 | 55.886784ms | 127.0.0.1 | POST "/api/show" | |
juin 24 23:11:00 dbox2 ollama[213509]: [GIN] 2025/06/24 - 23:11:00 | 200 | 86.952933ms | 127.0.0.1 | POST "/api/show" | |
juin 24 23:11:00 dbox2 ollama[213509]: [GIN] 2025/06/24 - 23:11:00 | 200 | 168.358275ms | 127.0.0.1 | POST "/api/show" | |
juin 24 23:11:00 dbox2 ollama[213509]: [GIN] 2025/06/24 - 23:11:00 | 200 | 190.893861ms | 127.0.0.1 | POST "/api/show" | |
juin 24 23:11:13 dbox2 ollama[213509]: time=2025-06-24T23:11:13.801+02:00 level=INFO source=server.go:135 msg="system memory" total="15.1 GiB" free="11.4 GiB" free_swap="8.0 GiB" | |
juin 24 23:11:13 dbox2 ollama[213509]: time=2025-06-24T23:11:13.802+02:00 level=INFO source=server.go:168 msg=offload library=cuda layers.requested=-1 layers.model=25 layers.offload=11 layers.split="" memory.available="[1.9 GiB]" memory.gpu_overhead="0 B" memory.required.full="3.1 GiB" memory.required.partial="1.9 GiB" memory.required.kv="1.5 GiB" memory.required.allocations="[1.9 GiB]" memory.weights.total="703.4 MiB" memory.weights.repeating="651.8 MiB" memory.weights.nonrepeating="51.7 MiB" memory.graph.full="288.0 MiB" memory.graph.partial="346.3 MiB" | |
juin 24 23:11:13 dbox2 ollama[213509]: llama_model_loader: loaded meta data with 26 key-value pairs and 219 tensors from /var/lib/ollama/.ollama/models/blobs/sha256-d040cc18521592f70c199396aeaa44cdc40224079156dc09d4283d745d9dc5fd (version GGUF V3 (latest)) | |
juin 24 23:11:13 dbox2 ollama[213509]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. | |
juin 24 23:11:13 dbox2 ollama[213509]: llama_model_loader: - kv 0: general.architecture str = llama | |
juin 24 23:11:13 dbox2 ollama[213509]: llama_model_loader: - kv 1: general.name str = deepseek-ai | |
juin 24 23:11:13 dbox2 ollama[213509]: llama_model_loader: - kv 2: llama.context_length u32 = 16384 | |
juin 24 23:11:13 dbox2 ollama[213509]: llama_model_loader: - kv 3: llama.embedding_length u32 = 2048 | |
juin 24 23:11:13 dbox2 ollama[213509]: llama_model_loader: - kv 4: llama.block_count u32 = 24 | |
juin 24 23:11:13 dbox2 ollama[213509]: llama_model_loader: - kv 5: llama.feed_forward_length u32 = 5504 | |
juin 24 23:11:13 dbox2 ollama[213509]: llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128 | |
juin 24 23:11:13 dbox2 ollama[213509]: llama_model_loader: - kv 7: llama.attention.head_count u32 = 16 | |
juin 24 23:11:13 dbox2 ollama[213509]: llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 16 | |
juin 24 23:11:13 dbox2 ollama[213509]: llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000001 | |
juin 24 23:11:13 dbox2 ollama[213509]: llama_model_loader: - kv 10: llama.rope.freq_base f32 = 100000.000000 | |
juin 24 23:11:13 dbox2 ollama[213509]: llama_model_loader: - kv 11: llama.rope.scaling.type str = linear | |
juin 24 23:11:13 dbox2 ollama[213509]: llama_model_loader: - kv 12: llama.rope.scaling.factor f32 = 4.000000 | |
juin 24 23:11:13 dbox2 ollama[213509]: llama_model_loader: - kv 13: general.file_type u32 = 2 | |
juin 24 23:11:13 dbox2 ollama[213509]: llama_model_loader: - kv 14: tokenizer.ggml.model str = gpt2 | |
juin 24 23:11:13 dbox2 ollama[213509]: llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,32256] = ["!", "\"", "#", "$", "%", "&", "'", ... | |
juin 24 23:11:13 dbox2 ollama[213509]: llama_model_loader: - kv 16: tokenizer.ggml.scores arr[f32,32256] = [0.000000, 0.000000, 0.000000, 0.0000... | |
juin 24 23:11:13 dbox2 ollama[213509]: llama_model_loader: - kv 17: tokenizer.ggml.token_type arr[i32,32256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... | |
juin 24 23:11:13 dbox2 ollama[213509]: llama_model_loader: - kv 18: tokenizer.ggml.merges arr[str,31757] = ["Ġ Ġ", "Ġ t", "Ġ a", "i n", "h e... | |
juin 24 23:11:13 dbox2 ollama[213509]: llama_model_loader: - kv 19: tokenizer.ggml.bos_token_id u32 = 32013 | |
juin 24 23:11:13 dbox2 ollama[213509]: llama_model_loader: - kv 20: tokenizer.ggml.eos_token_id u32 = 32021 | |
juin 24 23:11:13 dbox2 ollama[213509]: llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 32014 | |
juin 24 23:11:13 dbox2 ollama[213509]: llama_model_loader: - kv 22: tokenizer.ggml.add_bos_token bool = true | |
juin 24 23:11:13 dbox2 ollama[213509]: llama_model_loader: - kv 23: tokenizer.ggml.add_eos_token bool = false | |
juin 24 23:11:13 dbox2 ollama[213509]: llama_model_loader: - kv 24: tokenizer.chat_template str = {% if not add_generation_prompt is de... | |
juin 24 23:11:13 dbox2 ollama[213509]: llama_model_loader: - kv 25: general.quantization_version u32 = 2 | |
juin 24 23:11:13 dbox2 ollama[213509]: llama_model_loader: - type f32: 49 tensors | |
juin 24 23:11:13 dbox2 ollama[213509]: llama_model_loader: - type q4_0: 169 tensors | |
juin 24 23:11:13 dbox2 ollama[213509]: llama_model_loader: - type q6_K: 1 tensors | |
juin 24 23:11:13 dbox2 ollama[213509]: print_info: file format = GGUF V3 (latest) | |
juin 24 23:11:13 dbox2 ollama[213509]: print_info: file type = Q4_0 | |
juin 24 23:11:13 dbox2 ollama[213509]: print_info: file size = 738.88 MiB (4.60 BPW) | |
juin 24 23:11:13 dbox2 ollama[213509]: load: missing or unrecognized pre-tokenizer type, using: 'default' | |
juin 24 23:11:13 dbox2 ollama[213509]: load: control-looking token: 32015 '<|fim▁hole|>' was not control-type; this is probably a bug in the model. its type will be overridden | |
juin 24 23:11:13 dbox2 ollama[213509]: load: control-looking token: 32017 '<|fim▁end|>' was not control-type; this is probably a bug in the model. its type will be overridden | |
juin 24 23:11:13 dbox2 ollama[213509]: load: control-looking token: 32016 '<|fim▁begin|>' was not control-type; this is probably a bug in the model. its type will be overridden | |
juin 24 23:11:13 dbox2 ollama[213509]: load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect | |
juin 24 23:11:13 dbox2 ollama[213509]: load: special_eot_id is not in special_eog_ids - the tokenizer config may be incorrect | |
juin 24 23:11:13 dbox2 ollama[213509]: load: special tokens cache size = 256 | |
juin 24 23:11:13 dbox2 ollama[213509]: load: token to piece cache size = 0.1792 MB | |
juin 24 23:11:13 dbox2 ollama[213509]: print_info: arch = llama | |
juin 24 23:11:13 dbox2 ollama[213509]: print_info: vocab_only = 1 | |
juin 24 23:11:13 dbox2 ollama[213509]: print_info: model type = ?B | |
juin 24 23:11:13 dbox2 ollama[213509]: print_info: model params = 1.35 B | |
juin 24 23:11:13 dbox2 ollama[213509]: print_info: general.name = deepseek-ai | |
juin 24 23:11:13 dbox2 ollama[213509]: print_info: vocab type = BPE | |
juin 24 23:11:13 dbox2 ollama[213509]: print_info: n_vocab = 32256 | |
juin 24 23:11:13 dbox2 ollama[213509]: print_info: n_merges = 31757 | |
juin 24 23:11:13 dbox2 ollama[213509]: print_info: BOS token = 32013 '<|begin▁of▁sentence|>' | |
juin 24 23:11:13 dbox2 ollama[213509]: print_info: EOS token = 32021 '<|EOT|>' | |
juin 24 23:11:13 dbox2 ollama[213509]: print_info: EOT token = 32014 '<|end▁of▁sentence|>' | |
juin 24 23:11:13 dbox2 ollama[213509]: print_info: PAD token = 32014 '<|end▁of▁sentence|>' | |
juin 24 23:11:13 dbox2 ollama[213509]: print_info: LF token = 185 'Ċ' | |
juin 24 23:11:13 dbox2 ollama[213509]: print_info: FIM PRE token = 32016 '<|fim▁begin|>' | |
juin 24 23:11:13 dbox2 ollama[213509]: print_info: FIM SUF token = 32015 '<|fim▁hole|>' | |
juin 24 23:11:13 dbox2 ollama[213509]: print_info: FIM MID token = 32017 '<|fim▁end|>' | |
juin 24 23:11:13 dbox2 ollama[213509]: print_info: EOG token = 32014 '<|end▁of▁sentence|>' | |
juin 24 23:11:13 dbox2 ollama[213509]: print_info: EOG token = 32021 '<|EOT|>' | |
juin 24 23:11:13 dbox2 ollama[213509]: print_info: max token length = 128 | |
juin 24 23:11:13 dbox2 ollama[213509]: llama_model_load: vocab only - skipping tensors | |
juin 24 23:11:13 dbox2 ollama[213509]: time=2025-06-24T23:11:13.974+02:00 level=INFO source=server.go:431 msg="starting llama server" cmd="/usr/bin/ollama runner --model /var/lib/ollama/.ollama/models/blobs/sha256-d040cc18521592f70c199396aeaa44cdc40224079156dc09d4283d745d9dc5fd --ctx-size 8192 --batch-size 512 --n-gpu-layers 11 --threads 2 --parallel 1 --port 40727" | |
juin 24 23:11:13 dbox2 ollama[213509]: time=2025-06-24T23:11:13.975+02:00 level=INFO source=sched.go:483 msg="loaded runners" count=1 | |
juin 24 23:11:13 dbox2 ollama[213509]: time=2025-06-24T23:11:13.975+02:00 level=INFO source=server.go:591 msg="waiting for llama runner to start responding" | |
juin 24 23:11:13 dbox2 ollama[213509]: time=2025-06-24T23:11:13.975+02:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server not responding" | |
juin 24 23:11:14 dbox2 ollama[213509]: time=2025-06-24T23:11:14.003+02:00 level=INFO source=runner.go:815 msg="starting go runner" | |
juin 24 23:11:14 dbox2 ollama[213509]: load_backend: loaded CPU backend from /usr/lib64/ollama/libggml-cpu-x64.so | |
juin 24 23:11:14 dbox2 ollama[213509]: time=2025-06-24T23:11:14.013+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.SSE3=1 CPU.1.SSSE3=1 CPU.1.AVX=1 CPU.1.AVX2=1 CPU.1.F16C=1 CPU.1.FMA=1 CPU.1.BMI2=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) | |
juin 24 23:11:14 dbox2 ollama[213509]: time=2025-06-24T23:11:14.020+02:00 level=INFO source=runner.go:874 msg="Server listening on 127.0.0.1:40727" | |
juin 24 23:11:14 dbox2 ollama[213509]: llama_model_loader: loaded meta data with 26 key-value pairs and 219 tensors from /var/lib/ollama/.ollama/models/blobs/sha256-d040cc18521592f70c199396aeaa44cdc40224079156dc09d4283d745d9dc5fd (version GGUF V3 (latest)) | |
juin 24 23:11:14 dbox2 ollama[213509]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. | |
juin 24 23:11:14 dbox2 ollama[213509]: llama_model_loader: - kv 0: general.architecture str = llama | |
juin 24 23:11:14 dbox2 ollama[213509]: llama_model_loader: - kv 1: general.name str = deepseek-ai | |
juin 24 23:11:14 dbox2 ollama[213509]: llama_model_loader: - kv 2: llama.context_length u32 = 16384 | |
juin 24 23:11:14 dbox2 ollama[213509]: llama_model_loader: - kv 3: llama.embedding_length u32 = 2048 | |
juin 24 23:11:14 dbox2 ollama[213509]: llama_model_loader: - kv 4: llama.block_count u32 = 24 | |
juin 24 23:11:14 dbox2 ollama[213509]: llama_model_loader: - kv 5: llama.feed_forward_length u32 = 5504 | |
juin 24 23:11:14 dbox2 ollama[213509]: llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128 | |
juin 24 23:11:14 dbox2 ollama[213509]: llama_model_loader: - kv 7: llama.attention.head_count u32 = 16 | |
juin 24 23:11:14 dbox2 ollama[213509]: llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 16 | |
juin 24 23:11:14 dbox2 ollama[213509]: llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000001 | |
juin 24 23:11:14 dbox2 ollama[213509]: llama_model_loader: - kv 10: llama.rope.freq_base f32 = 100000.000000 | |
juin 24 23:11:14 dbox2 ollama[213509]: llama_model_loader: - kv 11: llama.rope.scaling.type str = linear | |
juin 24 23:11:14 dbox2 ollama[213509]: llama_model_loader: - kv 12: llama.rope.scaling.factor f32 = 4.000000 | |
juin 24 23:11:14 dbox2 ollama[213509]: llama_model_loader: - kv 13: general.file_type u32 = 2 | |
juin 24 23:11:14 dbox2 ollama[213509]: llama_model_loader: - kv 14: tokenizer.ggml.model str = gpt2 | |
juin 24 23:11:14 dbox2 ollama[213509]: llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,32256] = ["!", "\"", "#", "$", "%", "&", "'", ... | |
juin 24 23:11:14 dbox2 ollama[213509]: llama_model_loader: - kv 16: tokenizer.ggml.scores arr[f32,32256] = [0.000000, 0.000000, 0.000000, 0.0000... | |
juin 24 23:11:14 dbox2 ollama[213509]: llama_model_loader: - kv 17: tokenizer.ggml.token_type arr[i32,32256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... | |
juin 24 23:11:14 dbox2 ollama[213509]: llama_model_loader: - kv 18: tokenizer.ggml.merges arr[str,31757] = ["Ġ Ġ", "Ġ t", "Ġ a", "i n", "h e... | |
juin 24 23:11:14 dbox2 ollama[213509]: llama_model_loader: - kv 19: tokenizer.ggml.bos_token_id u32 = 32013 | |
juin 24 23:11:14 dbox2 ollama[213509]: llama_model_loader: - kv 20: tokenizer.ggml.eos_token_id u32 = 32021 | |
juin 24 23:11:14 dbox2 ollama[213509]: llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 32014 | |
juin 24 23:11:14 dbox2 ollama[213509]: llama_model_loader: - kv 22: tokenizer.ggml.add_bos_token bool = true | |
juin 24 23:11:14 dbox2 ollama[213509]: llama_model_loader: - kv 23: tokenizer.ggml.add_eos_token bool = false | |
juin 24 23:11:14 dbox2 ollama[213509]: llama_model_loader: - kv 24: tokenizer.chat_template str = {% if not add_generation_prompt is de... | |
juin 24 23:11:14 dbox2 ollama[213509]: llama_model_loader: - kv 25: general.quantization_version u32 = 2 | |
juin 24 23:11:14 dbox2 ollama[213509]: llama_model_loader: - type f32: 49 tensors | |
juin 24 23:11:14 dbox2 ollama[213509]: llama_model_loader: - type q4_0: 169 tensors | |
juin 24 23:11:14 dbox2 ollama[213509]: llama_model_loader: - type q6_K: 1 tensors | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: file format = GGUF V3 (latest) | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: file type = Q4_0 | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: file size = 738.88 MiB (4.60 BPW) | |
juin 24 23:11:14 dbox2 ollama[213509]: load: missing or unrecognized pre-tokenizer type, using: 'default' | |
juin 24 23:11:14 dbox2 ollama[213509]: load: control-looking token: 32015 '<|fim▁hole|>' was not control-type; this is probably a bug in the model. its type will be overridden | |
juin 24 23:11:14 dbox2 ollama[213509]: load: control-looking token: 32017 '<|fim▁end|>' was not control-type; this is probably a bug in the model. its type will be overridden | |
juin 24 23:11:14 dbox2 ollama[213509]: load: control-looking token: 32016 '<|fim▁begin|>' was not control-type; this is probably a bug in the model. its type will be overridden | |
juin 24 23:11:14 dbox2 ollama[213509]: load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect | |
juin 24 23:11:14 dbox2 ollama[213509]: load: special_eot_id is not in special_eog_ids - the tokenizer config may be incorrect | |
juin 24 23:11:14 dbox2 ollama[213509]: load: special tokens cache size = 256 | |
juin 24 23:11:14 dbox2 ollama[213509]: load: token to piece cache size = 0.1792 MB | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: arch = llama | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: vocab_only = 0 | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: n_ctx_train = 16384 | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: n_embd = 2048 | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: n_layer = 24 | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: n_head = 16 | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: n_head_kv = 16 | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: n_rot = 128 | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: n_swa = 0 | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: n_swa_pattern = 1 | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: n_embd_head_k = 128 | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: n_embd_head_v = 128 | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: n_gqa = 1 | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: n_embd_k_gqa = 2048 | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: n_embd_v_gqa = 2048 | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: f_norm_eps = 0.0e+00 | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: f_norm_rms_eps = 1.0e-06 | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: f_clamp_kqv = 0.0e+00 | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: f_max_alibi_bias = 0.0e+00 | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: f_logit_scale = 0.0e+00 | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: f_attn_scale = 0.0e+00 | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: n_ff = 5504 | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: n_expert = 0 | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: n_expert_used = 0 | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: causal attn = 1 | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: pooling type = 0 | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: rope type = 0 | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: rope scaling = linear | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: freq_base_train = 100000.0 | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: freq_scale_train = 0.25 | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: n_ctx_orig_yarn = 16384 | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: rope_finetuned = unknown | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: ssm_d_conv = 0 | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: ssm_d_inner = 0 | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: ssm_d_state = 0 | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: ssm_dt_rank = 0 | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: ssm_dt_b_c_rms = 0 | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: model type = ?B | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: model params = 1.35 B | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: general.name = deepseek-ai | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: vocab type = BPE | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: n_vocab = 32256 | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: n_merges = 31757 | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: BOS token = 32013 '<|begin▁of▁sentence|>' | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: EOS token = 32021 '<|EOT|>' | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: EOT token = 32014 '<|end▁of▁sentence|>' | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: PAD token = 32014 '<|end▁of▁sentence|>' | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: LF token = 185 'Ċ' | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: FIM PRE token = 32016 '<|fim▁begin|>' | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: FIM SUF token = 32015 '<|fim▁hole|>' | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: FIM MID token = 32017 '<|fim▁end|>' | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: EOG token = 32014 '<|end▁of▁sentence|>' | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: EOG token = 32021 '<|EOT|>' | |
juin 24 23:11:14 dbox2 ollama[213509]: print_info: max token length = 128 | |
juin 24 23:11:14 dbox2 ollama[213509]: load_tensors: loading model tensors, this can take a while... (mmap = true) | |
juin 24 23:11:14 dbox2 ollama[213509]: time=2025-06-24T23:11:14.229+02:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server loading model" | |
juin 24 23:11:15 dbox2 ollama[213509]: load_tensors: CPU_Mapped model buffer size = 738.88 MiB | |
juin 24 23:11:15 dbox2 ollama[213509]: llama_context: constructing llama_context | |
juin 24 23:11:15 dbox2 ollama[213509]: llama_context: n_seq_max = 1 | |
juin 24 23:11:15 dbox2 ollama[213509]: llama_context: n_ctx = 8192 | |
juin 24 23:11:15 dbox2 ollama[213509]: llama_context: n_ctx_per_seq = 8192 | |
juin 24 23:11:15 dbox2 ollama[213509]: llama_context: n_batch = 512 | |
juin 24 23:11:15 dbox2 ollama[213509]: llama_context: n_ubatch = 512 | |
juin 24 23:11:15 dbox2 ollama[213509]: llama_context: causal_attn = 1 | |
juin 24 23:11:15 dbox2 ollama[213509]: llama_context: flash_attn = 0 | |
juin 24 23:11:15 dbox2 ollama[213509]: llama_context: freq_base = 100000.0 | |
juin 24 23:11:15 dbox2 ollama[213509]: llama_context: freq_scale = 0.25 | |
juin 24 23:11:15 dbox2 ollama[213509]: llama_context: n_ctx_per_seq (8192) < n_ctx_train (16384) -- the full capacity of the model will not be utilized | |
juin 24 23:11:15 dbox2 ollama[213509]: llama_context: CPU output buffer size = 0.13 MiB | |
juin 24 23:11:15 dbox2 ollama[213509]: llama_kv_cache_unified: kv_size = 8192, type_k = 'f16', type_v = 'f16', n_layer = 24, can_shift = 1, padding = 32 | |
juin 24 23:11:18 dbox2 ollama[213509]: llama_kv_cache_unified: CPU KV buffer size = 1536.00 MiB | |
juin 24 23:11:18 dbox2 ollama[213509]: llama_kv_cache_unified: KV self size = 1536.00 MiB, K (f16): 768.00 MiB, V (f16): 768.00 MiB | |
juin 24 23:11:18 dbox2 ollama[213509]: llama_context: CPU compute buffer size = 288.01 MiB | |
juin 24 23:11:18 dbox2 ollama[213509]: llama_context: graph nodes = 822 | |
juin 24 23:11:18 dbox2 ollama[213509]: llama_context: graph splits = 1 | |
juin 24 23:11:18 dbox2 ollama[213509]: time=2025-06-24T23:11:18.247+02:00 level=INFO source=server.go:630 msg="llama runner started in 4.27 seconds" | |
juin 24 23:12:43 dbox2 ollama[213509]: [GIN] 2025/06/24 - 23:12:43 | 200 | 35.625µs | 127.0.0.1 | HEAD "/" | |
juin 24 23:12:43 dbox2 ollama[213509]: [GIN] 2025/06/24 - 23:12:43 | 200 | 41.515µs | 127.0.0.1 | GET "/api/ps" |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment