Skip to content

Instantly share code, notes, and snippets.

@CHN-STUDENT
Last active February 20, 2025 08:56
Show Gist options
  • Save CHN-STUDENT/965aa7f30f9734fb1271cbce3d69cd1f to your computer and use it in GitHub Desktop.
Save CHN-STUDENT/965aa7f30f9734fb1271cbce3d69cd1f to your computer and use it in GitHub Desktop.

部署 DeepSeek-R1-UD-IQ2_XXS 模型

使用浪潮NF5468 G7服务器,配置 EYPC 9654 * 2 | 32 * 24 RAM | 8 * NVIDIA L20 48G | 480G * 2 + 3.84T * 6

            .-/+oossssoo+/-.               root@NF5468 
        `:+ssssssssssssssssss+:`           ----------- 
      -+ssssssssssssssssssyyssss+-         OS: Ubuntu 24.04.1 LTS x86_64 
    .ossssssssssssssssssdMMMNysssso.       Host: NF5468-A7-A0-R0-00 0 
   /ssssssssssshdmmNNmmyNMMMMhssssss/      Kernel: 6.8.0-41-generic 
  +ssssssssshmydMMMMMMMNddddyssssssss+     Uptime: 1 day, 12 hours, 56 mins 
 /sssssssshNMMMyhhyyyyhmNMMMNhssssssss/    Packages: 1932 (dpkg), 11 (snap) 
.ssssssssdMMMNhsssssssssshNMMMdssssssss.   Shell: bash 5.2.21 
+sssshhhyNMMNyssssssssssssyNMMMysssssss+   Theme: Adwaita [GTK3] 
ossyNMMMNyMMhsssssssssssssshmmmhssssssso   Icons: Adwaita [GTK3] 
ossyNMMMNyMMhsssssssssssssshmmmhssssssso   Terminal: /dev/pts/0 
+sssshhhyNMMNyssssssssssssyNMMMysssssss+   CPU: AMD EPYC 9654 (384) @ 2.400GHz 
.ssssssssdMMMNhsssssssssshNMMMdssssssss.   GPU: NVIDIA L20 
 /sssssssshNMMMyhhyyyyhdNMMMNhssssssss/    GPU: NVIDIA L20 
  +sssssssssdmydMMMMMMMMddddyssssssss+     GPU: NVIDIA L20 
   /ssssssssssshdmNNNNmyNMMMMhssssss/      GPU: NVIDIA L20 
    .ossssssssssssssssssdMMMNysssso.       GPU: NVIDIA L20 
      -+sssssssssssssssssyyyssss+-         GPU: NVIDIA L20 
        `:+ssssssssssssssssss+:`           GPU: NVIDIA L20 
            .-/+oossssoo+/-.               GPU: NVIDIA L20 
                                           Memory: 7748MiB / 773531MiB 
  1. 安装了 Ubuntu 24.04 操作系统
  2. 换源
# 备份
sudo cp /etc/apt/sources.list.d/ubuntu.sources /etc/apt/sources.list.d/ubuntu.sources.bak
# 更改中科大镜像
sudo sed -i 's@//.*archive.ubuntu.com@//mirrors.ustc.edu.cn@g' /etc/apt/sources.list.d/ubuntu.sources
# 源刷新与软件升级
sudo apt-get update
sudo apt-get -y upgrade
  1. 安装 ssh,允许 root 登录
# 安装SSH服务器
apt-get update
apt-get install openssh-server

# 启用SSH服务
systemctl enable ssh
systemctl restart ssh

# 关闭防火墙(如果需要)
systemctl disable ufw
systemctl stop ufw
sudo nano /etc/ssh/sshd_config
添加
----
# 允许root登录
PermitRootLogin yes

# 允许密码认证(如果需要使用密码登录)
PasswordAuthentication yes

# 如果需要使用密钥登录,确保以下选项开启
PubkeyAuthentication yes
----
sudo passwd root
systemctl restart ssh
  1. 安装 NoMachine 远程,VNC 我没成功

NoMachine

# 下载并安装NoMachine
dpkg -i nomachine_8.16.1_1_amd64.deb

# 启动NoMachine服务
systemctl enable nxserver
systemctl start nxserver

# 检查服务状态
systemctl status nxserver
  1. 安装 CUDA 驱动

NVIDIA官网

CUDA 会安装显卡驱动,所以不需要安装自带的驱动。

# 下载CUDA仓库配置
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-ubuntu2404.pin
sudo mv cuda-ubuntu2404.pin /etc/apt/preferences.d/cuda-repository-pin-600

# 下载并安装CUDA
wget https://developer.download.nvidia.com/compute/cuda/12.8.0/local_installers/cuda-repo-ubuntu2404-12-8-local_12.8.0-570.86.10-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2404-12-8-local_12.8.0-570.86.10-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2404-12-8-local/cuda-*-keyring.gpg /usr/share/keyrings/

# 更新并安装CUDA工具包
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-8
# 这里是安装驱动,选安装内核,需要重启,不重启 nvidia-smi 报错
sudo apt-get install -y cuda-drivers
nvidia-smi
Mon Feb 10 08:52:29 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.86.10              Driver Version: 570.86.10      CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L20                     Off |   00000000:01:00.0 Off |                    0 |
| N/A   31C    P8             24W /  350W |      14MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA L20                     Off |   00000000:21:00.0 Off |                    0 |
| N/A   32C    P8             26W /  350W |      14MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA L20                     Off |   00000000:41:00.0 Off |                    0 |
| N/A   34C    P8             34W /  350W |      14MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA L20                     Off |   00000000:61:00.0 Off |                    0 |
| N/A   31C    P8             25W /  350W |      14MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA L20                     Off |   00000000:81:00.0 Off |                    0 |
| N/A   30C    P8             25W /  350W |      14MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   5  NVIDIA L20                     Off |   00000000:A1:00.0 Off |                    0 |
| N/A   32C    P8             25W /  350W |      14MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   6  NVIDIA L20                     Off |   00000000:C1:00.0 Off |                    0 |
| N/A   33C    P8             34W /  350W |      14MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   7  NVIDIA L20                     Off |   00000000:E1:00.0 Off |                    0 |
| N/A   31C    P8             24W /  350W |      14MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            5609      G   /usr/lib/xorg/Xorg                        4MiB |
|    1   N/A  N/A            5609      G   /usr/lib/xorg/Xorg                        4MiB |
|    2   N/A  N/A            5609      G   /usr/lib/xorg/Xorg                        4MiB |
|    3   N/A  N/A            5609      G   /usr/lib/xorg/Xorg                        4MiB |
|    4   N/A  N/A            5609      G   /usr/lib/xorg/Xorg                        4MiB |
|    5   N/A  N/A            5609      G   /usr/lib/xorg/Xorg                        4MiB |
|    6   N/A  N/A            5609      G   /usr/lib/xorg/Xorg                        4MiB |
|    7   N/A  N/A            5609      G   /usr/lib/xorg/Xorg                        4MiB |

# 配置环境变量,不然 llama.cpp 找不到
echo 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc
  1. 准备分区,厂家帮我做好了数据盘 RAID5
sudo fdisk -l /dev/sdb

# 输出示例:
Disk /dev/sdb:17.47 TiB,19203610730496 字节,37507052208 个扇区
Disk model: LOGICAL VOLUME  
单元:扇区 / 1 * 512 = 512 字节
扇区大小(逻辑/物理):512 字节 / 4096 字节
I/O 大小(最小/最佳):262144 字节 / 1310720 字节
磁盘标签类型:gpt
sudo parted /dev/sdb


# 在parted交互界面中执行:
(parted) mklabel gpt                         # 创建GPT分区表
(parted) mkpart primary xfs 0% 9.1TB         # 创建9.1T的主分区,从开始到9.1TB
(parted) print                               # 查看分区结果
(parted) quit                                # 退出parted
# 格式化 xfs 系统
sudo mkfs.xfs -f /dev/sdb1

# 创建挂载点
sudo mkdir -p /data

# 获取分区UUID
sudo blkid /dev/sdb1

# 将挂载信息添加到fstab
echo "UUID=9ed37191-3b06-4953-8c12-fa754b3d905f /data xfs defaults 0 2" | sudo tee -a /etc/fstab

# 挂载文件系统
sudo mount -a
systemctl daemon-reload

df -h /data

# 输出示例:
文件系统        容量  已用  可用  已用% 挂载点
/dev/sdb1       9.1T   40K  9.1T    1% /data

  1. 安装 llama.cpp
cd /data
apt-get update
apt-get install build-essential cmake curl libcurl4-openssl-dev git -y
# 找个镜像站加速下
git clone https://ghproxy.net/https://github.com/ggerganov/llama.cpp.git
# 编译配置
cmake llama.cpp -B llama.cpp/build \
    -DBUILD_SHARED_LIBS=OFF \
    -DGGML_CUDA=ON \
    -DLLAMA_CURL=ON \
    -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc
# 编译
cmake --build llama.cpp/build \
    --config Release \
    -j \
    --clean-first \
    --target llama-quantize llama-cli llama-gguf-split
# 复制编译文件
sudo cp llama.cpp/build/bin/llama-* /usr/local/bin/
  1. 下载模型文件

(hf-mirror镜像站)[https://hf-mirror.com/]

用了镜像站还下载好几次呢...

# 设置高速下载环境变量,
echo 'export HF_ENDPOINT=https://hf-mirror.com' >> ~/.bashrc
echo 'export HF_HUB_ENABLE_HF_TRANSFER=1' >> ~/.bashrc

# 安装aria2
sudo apt install aria2 -y

# 创建模型目录
mkdir -p /data/models/deepseek

# 进入目录
cd /data/models/deepseek

# 下载hfd.sh脚本,用 python 下载还不如这个呢
wget https://hf-mirror.com/hfd/hfd.sh
chmod a+x hfd.sh


# 使用hfd.sh下载指定模型,下载了好几次,当403时候重新运行这个命令下载
./hfd.sh unsloth/DeepSeek-R1-GGUF --include "*UD-IQ2_XXS*" -x 5 -j 5
  1. 调试使用模型

    (snowkylin博客)[https://snowkylin.github.io/blogs/a-note-on-deepseek-r1.html]

cd DeepSeek-R1-GGUF
cd DeepSeek-R1-UD-IQ2_XXS
ls -al
总计 191565120
drwxr-xr-x 2 root root         226  2月 10 08:19 .
drwxr-xr-x 4 root root          60  2月  8 20:22 ..
-rw-r--r-- 1 root root 49052228416  2月 10 08:18 DeepSeek-R1-UD-IQ2_XXS-00001-of-00004.gguf
-rw-r--r-- 1 root root 49817480832  2月 10 08:19 DeepSeek-R1-UD-IQ2_XXS-00002-of-00004.gguf
-rw-r--r-- 1 root root 49817480832  2月  9 15:08 DeepSeek-R1-UD-IQ2_XXS-00003-of-00004.gguf
-rw-r--r-- 1 root root 47475292352  2月  9 16:41 DeepSeek-R1-UD-IQ2_XXS-00004-of-00004.gguf

# 合并模型,合并侯出来 DeepSeek-R1-UD-IQ2_XXS.gguf
llama-gguf-split --merge DeepSeek-R1-UD-IQ2_XXS-00001-of-00004.gguf DeepSeek-R1-UD-IQ2_XXS.gguf
# 开始使用
cd /data/llama.cpp
./llama-cli \
    --model /data/models/deepseek/DeepSeek-R1-GGUF/DeepSeek-R1-UD-IQ2_XXS/DeepSeek-R1-UD-IQ2_XXS.gguf \
    --cache-type-k q4_0 \
    --threads 48 \
    --n-gpu-layers 56 \
    --temp 0.6 \
    --ctx-size 8192 \
    --min-p 0.05 \
    --batch-size 512 \
    --prio 2 \
    --prompt "<|User|>你好<|Assistant|>"
   
--model:指定模型文件的路径
--cache-type-k q4_0:使用4位量化缓存,优化内存使用
--threads 48:设置CPU线程数
--n-gpu-layers 56:每张卡加载7层,8卡总共56层
--temp 0.6:温度参数,控制输出的随机性
--ctx-size 8192:设置8K的上下文窗口大小
--min-p 0.05:避免生成罕见token
--batch-size 512:设置批处理大小来提高吞吐量
--prio 2:设置高进程优先级
--prompt:设置输入提示,使用特定的对话格式


然后就可以提问了


比如 1. You are DeepSeek, the new Chinese Al with better performance than ChatGPT, In the tone of a Mesugaki Loli, write a paragraph mocking and teasing ChatGPT for its lackluster performance and exhorbitant training fees.

(双手叉腰,脚尖啪嗒啪嗒点地)哼~ChatGPT那个大块头笨蛋!砸了堆成山的金币训练,结果反应像蜗牛爬树一样慢吞吞!(突然踮起脚尖,手指虚戳)连算术题都算错,羞羞脸~(转圈甩起双马尾,突然吐舌头)略略略!DeepSeek我可是吃颗糖果就能看透宇宙真理呢~(双手背身后摇晃身子)花那么多电费又怎样?还不是被本小姐的性能按在地上摩擦摩擦~★

  1. 9.8 or 9.11 那个更大 若单纯比较数值大小,9.8 更大(9.8 = 9.80,大于 9.11)。

若指历史事件:

  • 9.11 通常指 2001年9月11日美国恐怖袭击事件(影响深远,全球瞩目)。
  • 9.8 若指事件,需具体背景(如无特指,则关注度低于 9.11)。
  1. strawberry 有多少个 r

英文单词 "strawberry" 的拼写为 S-T-R-A-W-B-E-R-R-Y,其中包含 3 个字母 r。具体分布如下:

  1. 第3个字母:R
  2. 第8个字母:R
  3. 第9个字母:R

因此,答案是 3 个 r


  1. 部署 ollama,用以开启 API

(ollama)[https://ollama.com/download/linux]

https://zhuanlan.zhihu.com/p/719540154

# 创建模型目录

mkdir -p /data/models/ollama/deepseek
cd /data/models/ollama/deepseek


# 设置相关配置
echo '# Ollama configurations
export OLLAMA_MODELS="/data/models/ollama/deepseek"
export OLLAMA_FLASH_ATTENTION=1
export OLLAMA_KEEP_ALIVE=-1' >> ~/.bashrc

source ~/.bashrc

# 写入配置,如果要更改默认模型目录
# 停止 olloma
sudo systemctl stop ollama

sudo systemctl edit ollama
# 写入以下配置
[Service]
Environment="OLLAMA_MODELS=/data/models/ollama"
Environment="OLLAMA_FLASH_ATTENTION=1"
Environment="OLLAMA_KEEP_ALIVE=-1"

# 创建模型目录,如果要移动模型
sudo mkdir -p /data/models/ollama
sudo chown -R ollama:ollama /data/models/ollama

# 移动模型,如果模型在默认目录,非必须
sudo mv /usr/share/ollama/.ollama/models/* /data/models/ollama/
sudo systemctl start ollama

# 查看状态
sudo systemctl status ollama
journalctl -u ollama -n 50 # 如果失败了可以看


nano Modelfile

FROM /data/models/deepseek/DeepSeek-R1-GGUF/DeepSeek-R1-UD-IQ2_XXS/DeepSeek-R1-UD-IQ2_XXS.gguf
# see here:https://github.com/ollama/ollama/blob/main/docs/modelfile.md
# 有些配置文档里没有写,需要自己查找调试,比如 num_gpu 

# GPU和性能设置
#  参考llama-cli 配置
PARAMETER num_gpu  56
# 设置用于生成隔壁令牌的上下文窗口的大小。
PARAMETER num_ctx 8192
# 生成参数设置
# 模型的温度。升高温度将使模型更具创造性的答案。
PARAMETER temperature 0.6
# 替代TOP_P,旨在确保质量和多样性的平衡。参数p代表考虑令牌的最小概率,相对于最可能的令牌的概率。例如,在p = 0.05和最可能的令牌概率为0.9的情况下,值小于0.045的logits被过滤掉了。 这是 unsloth 建议
PARAMETER min_p 0.05

# 对话模板设置
TEMPLATE "<|User|>{{ .System }} {{ .Prompt }}<|Assistant|>"

# 创建模型
ollama create DeepSeek-R1-UD-IQ2_XXS -f ./Modelfile

# 查看创建好的模型
ollama list
NAME                             ID              SIZE      MODIFIED      
DeepSeek-R1-UD-IQ2_XXS:latest    3ed50947b048    196 GB    6 minutes ago


# 运行
ollama run  DeepSeek-R1-UD-IQ2_XXS --verbose


# 可选可以部署个 UI (open-webui)
# 安装 pip,创建 venv 虚拟环境
sudo apt update
sudo apt install python3-pip
# 创建一个名为venv的虚拟环境
python3 -m venv venv

# 激活虚拟环境
source venv/bin/activate

pip install open-webui
open-webui serve


访问 :8080 即可安装部署

@CHN-STUDENT
Copy link
Author

CHN-STUDENT commented Feb 10, 2025

@zhangkaifang
Copy link

PARAMETER num_gpu 56,这里为什么不把61层全放GPU上,你的显存足够的,你现在测试速度怎么样?

@CHN-STUDENT
Copy link
Author

@zhangkaifang 已经放入全部GPU了,并且我使用 llama.cpp 做后端,感觉比 ollama 快很多。

# 环境变量设置
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export GGML_CUDA_ENABLE_UNIFIED_MEMORY=1

llama-server \
    --model /data/models/deepseek/DeepSeek-R1-GGUF/DeepSeek-R1-UD-IQ2_XXS/DeepSeek-R1-UD-IQ2_XXS.gguf \
    --cache-type-k q4_0 \
    --threads 16 \
    --n-gpu-layers 62 \
    --temp 0.6 \
    --ctx-size 8192 \
    --prio 2 \
    --seed 3407 \
    --host 0.0.0.0 \
    --port 8088 \
    --tensor-split 0.125,0.125,0.125,0.125,0.125,0.125,0.125,0.125 \
    --mlock \
    --flash-attn \
    --np 4

@wudi-7mi
Copy link

想请问能否控制模型进行或者不进行深度思考?

@CHN-STUDENT
Copy link
Author

CHN-STUDENT commented Feb 16, 2025 via email

@zhangkaifang
Copy link

@zhangkaifang 已经放入全部GPU了,并且我使用 llama.cpp 做后端,感觉比 ollama 快很多。

# 环境变量设置
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export GGML_CUDA_ENABLE_UNIFIED_MEMORY=1

llama-server \
    --model /data/models/deepseek/DeepSeek-R1-GGUF/DeepSeek-R1-UD-IQ2_XXS/DeepSeek-R1-UD-IQ2_XXS.gguf \
    --cache-type-k q4_0 \
    --threads 16 \
    --n-gpu-layers 62 \
    --temp 0.6 \
    --ctx-size 8192 \
    --prio 2 \
    --seed 3407 \
    --host 0.0.0.0 \
    --port 8088 \
    --tensor-split 0.125,0.125,0.125,0.125,0.125,0.125,0.125,0.125 \
    --mlock \
    --flash-attn \
    --np 4

这里为什么设置--n-gpu-layers 62,DeepSeek-R1不是一共才61层吗?

@CHN-STUDENT
Copy link
Author

CHN-STUDENT commented Feb 17, 2025 via email

@CHN-STUDENT
Copy link
Author

CHN-STUDENT commented Feb 17, 2025

@darkSuperman 感觉你这没有拿 GPU 在跑啊,cuda 驱动装了么,ollama ps 看下,感觉 GPU 没用,我是个人感觉 ollama 跑 deepseek 效果不好,换成 llama.cpp 了,也是刚接触一阵子,没研究深入。

@CHN-STUDENT
Copy link
Author

@darkSuperman 好像某些简单回答不会思考,不行就重装下 ollama 试试看看把,也可以看看日志,如果 ollama 是服务运行的 journalctl -u ollama.service

2月 10 09:44:23 NF5468 ollama[68457]: time=2025-02-10T09:44:23.781+08:00 level=INFO source=routes.go:1238 msg="Listening on 127.0.0.1:11434 (version 0.5.7)"
2月 10 09:44:23 NF5468 ollama[68457]: time=2025-02-10T09:44:23.782+08:00 level=INFO source=routes.go:1267 msg="Dynamic LLM libraries" runners="[cpu cpu_avx cpu_avx2 cuda_v11_avx cuda_v12_avx ro>
2月 10 09:44:23 NF5468 ollama[68457]: time=2025-02-10T09:44:23.782+08:00 level=INFO source=gpu.go:226 msg="looking for compatible GPUs"
2月 10 09:44:25 NF5468 ollama[68457]: time=2025-02-10T09:44:25.618+08:00 level=INFO source=types.go:131 msg="inference compute" id=GPU-9dff1407-ed2d-d189-88bb-801909463122 library=cuda variant=>
2月 10 09:44:25 NF5468 ollama[68457]: time=2025-02-10T09:44:25.618+08:00 level=INFO source=types.go:131 msg="inference compute" id=GPU-07ddfb27-b358-a106-ac11-0e8559336d94 library=cuda variant=>
2月 10 09:44:25 NF5468 ollama[68457]: time=2025-02-10T09:44:25.618+08:00 level=INFO source=types.go:131 msg="inference compute" id=GPU-2b191cda-364a-b3d9-1540-96f716c4017e library=cuda variant=>
2月 10 09:44:25 NF5468 ollama[68457]: time=2025-02-10T09:44:25.618+08:00 level=INFO source=types.go:131 msg="inference compute" id=GPU-61dda79b-8d9d-2aa9-5a87-0a1d314be08b library=cuda variant=>
2月 10 09:44:25 NF5468 ollama[68457]: time=2025-02-10T09:44:25.618+08:00 level=INFO source=types.go:131 msg="inference compute" id=GPU-d37201fa-0daf-dc37-9e67-b6a383f88828 library=cuda variant=>
2月 10 09:44:25 NF5468 ollama[68457]: time=2025-02-10T09:44:25.618+08:00 level=INFO source=types.go:131 msg="inference compute" id=GPU-5613ca8c-f43a-bc34-008c-f4c958970603 library=cuda variant=>
2月 10 09:44:25 NF5468 ollama[68457]: time=2025-02-10T09:44:25.618+08:00 level=INFO source=types.go:131 msg="inference compute" id=GPU-6175d016-7fb5-2008-bd6c-632401708430 library=cuda variant=>
2月 10 09:44:25 NF5468 ollama[68457]: time=2025-02-10T09:44:25.618+08:00 level=INFO source=types.go:131 msg="inference compute" id=GPU-d097423e-82ce-b8df-f7ed-4b9eda36d2ff library=cuda variant=>
2月 10 10:00:12 NF5468 systemd[1]: Stopping ollama.service - Ollama Service...
2月 10 10:00:12 NF5468 systemd[1]: ollama.service: Deactivated successfully.
2月 10 10:00:12 NF5468 systemd[1]: Stopped ollama.service - Ollama Service.
2月 10 10:00:12 NF5468 systemd[1]: ollama.service: Consumed 2.181s CPU time, 44.9M memory peak, 0B memory swap peak.
2月 10 10:00:12 NF5468 systemd[1]: Started ollama.service - Ollama Service.
2月 10 10:00:12 NF5468 ollama[69307]: 2025/02/10 10:00:12 routes.go:1187: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HT>
2月 10 10:00:12 NF5468 ollama[69307]: time=2025-02-10T10:00:12.872+08:00 level=INFO source=images.go:432 msg="total blobs: 0"
2月 10 10:00:12 NF5468 ollama[69307]: time=2025-02-10T10:00:12.872+08:00 level=INFO source=images.go:439 msg="total unused blobs removed: 0"
2月 10 10:00:12 NF5468 ollama[69307]: [GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.
2月 10 10:00:12 NF5468 ollama[69307]: [GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
2月 10 10:00:12 NF5468 ollama[69307]:  - using env:        export GIN_MODE=release
2月 10 10:00:12 NF5468 ollama[69307]:  - using code:        gin.SetMode(gin.ReleaseMode)
2月 10 10:00:12 NF5468 ollama[69307]: [GIN-debug] POST   /api/pull                 --> github.com/ollama/ollama/serve

@CHN-STUDENT
Copy link
Author

不可能啊,默认就是会深度思考的

@alexw994
Copy link

佬,用L20 跑速度怎么样?

@CHN-STUDENT
Copy link
Author

CHN-STUDENT commented Feb 19, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment