Skip to content

Instantly share code, notes, and snippets.

@rajivmehtaflex
Created June 29, 2024 04:31
Show Gist options
  • Save rajivmehtaflex/cc24caf4db6b0052fe93f1e787b2cc2c to your computer and use it in GitHub Desktop.
Save rajivmehtaflex/cc24caf4db6b0052fe93f1e787b2cc2c to your computer and use it in GitHub Desktop.
Quick Chatbot Deployment
python server.py
pip install huggingface-hub gradio llama-cpp-python \\n--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu
huggingface-cli login
mkdir models
cd models
huggingface-cli download Qwen/Qwen2-0.5B-Instruct-GGUF \\nqwen2-0_5b-instruct-q5_k_m.gguf \\n--local-dir . --local-dir-use-symlinks False
cd ..
touch server.py
import gradio as gr
from llama_cpp import Llama
llm = Llama(
model_path="./models/qwen2-0_5b-instruct-q5_k_m.gguf",
verbose=True,
)
def predict(message, history):
messages = [{"role": "system", "content": "You are a helpful assistant."}]
for user_message, bot_message in history:
if user_message:
messages.append({"role": "user", "content": user_message})
if bot_message:
messages.append({"role": "assistant", "content": bot_message})
messages.append({"role": "user", "content": message})
response = ""
for chunk in llm.create_chat_completion(
stream=True,
messages=messages,
):
part = chunk["choices"][0]["delta"].get("content", None)
if part:
response += part
yield response
demo = gr.ChatInterface(predict)
if __name__ == "__main__":
demo.launch(share=True)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment