Last active
December 9, 2024 09:55
-
-
Save up1/092bfb8ade545b3b3d2e5a2cf9724c71 to your computer and use it in GitHub Desktop.
Local LLM with LlamaEdge
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1. ติดตั้ง WasmEdge | |
$curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install_v2.sh | bash | |
2. Download LLM model เพื่อมา run บนเครื่อง local โดยใช้งาน Llama 3.2, 1b | |
$curl -LO https://huggingface.co/second-state/Llama-3.2-1B-Instruct-GGUF/resolve/main/Llama-3.2-1B-Instruct-Q5_K_M.gguf | |
3. ทำการ Download app chat มาใช้งาน เพื่อทดสอบการถามตอบกับ Local LLM server | |
$curl -LO https://github.com/second-state/LlamaEdge/releases/latest/download/llama-chat.wasm | |
4. ทำการ run app chat เพื่อใช้งาน model Llama 3.2, 1b ที่ทำการ download มา | |
$wasmedge --dir .:. --nn-preload default:GGML:AUTO:Llama-3.2-1B-Instruct-Q5_K_M.gguf llama-chat.wasm -p llama-3-chat | |
[INFO] llama-chat version: 0.14.17 | |
[INFO] Model name: default | |
[INFO] Model alias: default | |
[INFO] Prompt template: llama-3-chat | |
[INFO] Context size: 512 | |
[INFO] Number of tokens to predict: 1024 | |
[INFO] Number of layers to run on the GPU: 100 | |
[INFO] Threads: 2 | |
[INFO] Batch size for prompt processing: 512 | |
[INFO] Temperature for sampling: 1 | |
[INFO] Penalize repeat sequence of tokens: 1.1 | |
[INFO] Presence penalty (0.0 = disabled): 0 | |
[INFO] Frequency penalty (0.0 = disabled): 0 | |
[INFO] BNF-like grammar: | |
[INFO] Enable prompt log: false | |
[INFO] Enable plugin log: false | |
[INFO] Temperature for sampling: 1 | |
[INFO] Wasi-nn-ggml plugin: b4273 (commit c9c6e01d) | |
================================== Running in interactive mode. =================================== | |
- Press [Ctrl+C] to interject at any time. | |
- Press [Return] to end the input. | |
- For multi-line inputs, end each line with '\' and press [Return] to get another line. | |
[You]: | |
hello world, how are you ? | |
[Bot]: | |
Hello, I'm here and ready to help. | |
[You]: | |
who are you ? | |
[Bot]: | |
I'm a computer program designed to assist with questions and tasks. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1. ทำการ Download API server | |
$curl -LO https://github.com/LlamaEdge/LlamaEdge/releases/latest/download/llama-api-server.wasm | |
2. ทำการ download embedding model | |
$curl -LO https://huggingface.co/gaianet/Nomic-embed-text-v1.5-Embedding-GGUF/resolve/main/nomic-embed-text-v1.5.f16.gguf | |
3. ทำการ start API Server | |
$wasmedge --dir .:. \ | |
--nn-preload default:GGML:AUTO:Llama-3.2-1B-Instruct-Q5_K_M.gguf \ | |
--nn-preload embedding:GGML:AUTO:nomic-embed-text-v1.5.f16.gguf \ | |
llama-api-server.wasm \ | |
--model-alias default,embedding \ | |
--model-name llama-3.2-1b,nomic-embed \ | |
--prompt-template llama-3-chat,embedding \ | |
--batch-size 128,8192 \ | |
--ctx-size 8192,8192 | |
4. ทดสอบใช้งาน | |
$curl -X POST http://localhost:8080/v1/chat/completions -H 'accept:application/json' -H 'Content-Type: application/json' -d '{"messages":[{"role":"system", "content":"You are a helpful AI assistant"}, {"role":"user", "content":"Hello world"}], "model":"llama-3.2-1b"}' | |
ผลการทำงาน | |
{"id":"chatcmpl-f9831ff6-e7c4-4d64-abc0-aa2800c4e635","object":"chat.completion","created":1733725027,"model":"llama-3.2-1b","choices":[{"index":0,"message":{"content":"It's nice to meet you! Is there something I can help with or would you like to chat?","role":"assistant"},"finish_reason":"stop","logprobs":null}],"usage":{"prompt_tokens":23,"completion_tokens":23,"total_tokens":46}} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import openai | |
client = openai.OpenAI( | |
base_url="http://localhost:8080/v1/", | |
api_key="dfdf" | |
) | |
response = client.chat.completions.create( | |
model="llama-3.2-1b", | |
messages=[ | |
{"role": "system", "content": "You are a helpful AI assistant"}, | |
{"role": "user", "content": "Hello world"} | |
], | |
temperature=0.7, | |
max_tokens=500 | |
) | |
print(response) | |
# ผลการทำงาน | |
$python hello.py | |
ChatCompletion(id='chatcmpl-c27cf688-8d52-48a2-88fb-af41cccfa90f', | |
choices=[Choice(finish_reason='stop', index=0, logprobs=None, | |
message=ChatCompletionMessage(content="Hello! It's nice to meet you. How can I assist you today?", | |
refusal=None, role='assistant', audio=None, function_call=None, tool_calls=None))], | |
created=1733725397, model='llama-3.2-1b', object='chat.completion', service_tier=None, system_fingerprint=None, | |
usage=CompletionUsage(completion_tokens=18, prompt_tokens=23, total_tokens=41, | |
completion_tokens_details=None, prompt_tokens_details=None)) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment