Skip to content

Instantly share code, notes, and snippets.

@arun-gupta
Last active September 14, 2024 21:20
Show Gist options
  • Save arun-gupta/f25cfc7ee7bc1e02bc2da04abaa7e7c0 to your computer and use it in GitHub Desktop.
Save arun-gupta/f25cfc7ee7bc1e02bc2da04abaa7e7c0 to your computer and use it in GitHub Desktop.
OPEA on Microsoft Azure using Docker Compose

OPEA on Microsoft Azure using Docker Compose

Create your instance

Ubuntu 22.04

  • https://portal.azure.com/
  • Name: opea-demo
  • Region: (US) West US 2
  • Availability zone: Zone 2
  • Image: Ubuntu Server 24.04 LTS - x64 Gen2
  • Size: Standard_D8s_v4 (8 vcpus, 32 GiB memory)
  • Key pair name: azure-opea-demo
  • Click on Next : Disks >
  • Choose OS disk size as 512 GB (p20)
  • Select Review + Create
  • Once you see the message Validation passed, click on Create button
  • Click on Download private key and create resource, Go to resource
  • Click on Connect on top left, Select in SSH using Azure CLI

Install Docker:

NOTE: Copying the entire command does not work, had to copy line-by-line.

# Add Docker's official GPG key:
sudo apt-get -y update
sudo apt-get -y install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc

# Add the repository to Apt sources:
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get -y update
sudo apt-get -y install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

Docker images

  • Pull OPEA Docker images:
    sudo docker pull opea/chatqna:latest
    sudo docker pull opea/chatqna-conversation-ui:latest
    
  • Replace HuggingFace API token and private IP address of the host below and copy the contents in a file named .env:
    host_ip=10.0.0.4 #private IP address of the host
    no_proxy=${host_ip}
    HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
    EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
    RERANK_MODEL_ID="BAAI/bge-reranker-base"
    LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
    TEI_EMBEDDING_ENDPOINT="http://${host_ip}:6006"
    TEI_RERANKING_ENDPOINT="http://${host_ip}:8808"
    TGI_LLM_ENDPOINT="http://${host_ip}:9009"
    REDIS_URL="redis://${host_ip}:6379"
    INDEX_NAME="rag-redis"
    REDIS_HOST=${host_ip}
    MEGA_SERVICE_HOST_IP=${host_ip}
    EMBEDDING_SERVICE_HOST_IP=${host_ip}
    RETRIEVER_SERVICE_HOST_IP=${host_ip}
    RERANK_SERVICE_HOST_IP=${host_ip}
    LLM_SERVICE_HOST_IP=${host_ip}
    BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/chatqna"
    DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep"
    DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/get_file"
    DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/delete_file"
    
  • Download Docker Compose file:
    curl -O https://raw.githubusercontent.com/opea-project/GenAIExamples/main/ChatQnA/docker_compose/intel/cpu/xeon/compose.yaml
    
  • Start the application:
    sudo docker compose -f compose.yaml up -d
    
  • Verify the list of containers:
    sudo docker container ls
    CONTAINER ID   IMAGE                                                                 COMMAND                  CREATED          STATUS          PORTS                                                                                  NAMES
    f2a6fa5ea3b7   opea/chatqna-ui:latest                                                "docker-entrypoint.s…"   12 seconds ago   Up 10 seconds   0.0.0.0:5173->5173/tcp, :::5173->5173/tcp                                              chatqna-xeon-ui-server
    c88745a81f54   opea/chatqna:latest                                                   "python chatqna.py"      12 seconds ago   Up 10 seconds   0.0.0.0:8888->8888/tcp, :::8888->8888/tcp                                              chatqna-xeon-backend-server
    00f9b2f5c296   opea/dataprep-redis:latest                                            "python prepare_doc_…"   12 seconds ago   Up 11 seconds   0.0.0.0:6007->6007/tcp, :::6007->6007/tcp                                              dataprep-redis-server
    886350aea6fc   opea/llm-tgi:latest                                                   "bash entrypoint.sh"     12 seconds ago   Up 11 seconds   0.0.0.0:9000->9000/tcp, :::9000->9000/tcp                                              llm-tgi-server
    018d363ed61b   opea/retriever-redis:latest                                           "python retriever_re…"   12 seconds ago   Up 11 seconds   0.0.0.0:7000->7000/tcp, :::7000->7000/tcp                                              retriever-redis-server
    826d5ec265f3   opea/embedding-tei:latest                                             "python embedding_te…"   12 seconds ago   Up 11 seconds   0.0.0.0:6000->6000/tcp, :::6000->6000/tcp                                              embedding-tei-server
    ef4e354cf4cb   opea/reranking-tei:latest                                             "python reranking_te…"   12 seconds ago   Up 11 seconds   0.0.0.0:8000->8000/tcp, :::8000->8000/tcp                                              reranking-tei-xeon-server
    b2af32528f92   ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu   "text-generation-lau…"   12 seconds ago   Up 11 seconds   0.0.0.0:9009->80/tcp, [::]:9009->80/tcp                                                tgi-service
    ffd17623f9a2   redis/redis-stack:7.2.0-v9                                            "/entrypoint.sh"         12 seconds ago   Up 11 seconds   0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp   redis-vector-db
    52f70df956a2   ghcr.io/huggingface/text-embeddings-inference:cpu-1.5                 "text-embeddings-rou…"   12 seconds ago   Up 11 seconds   0.0.0.0:8808->80/tcp, [::]:8808->80/tcp                                                tei-reranking-server
    6cd64dca38c1   ghcr.io/huggingface/text-embeddings-inference:cpu-1.5                 "text-embeddings-rou…"   12 seconds ago   Up 11 seconds   0.0.0.0:6006->80/tcp, [::]:6006->80/tcp                                                tei-embedding-server
    

Validate Services

Export host_ip environment variable:

export host_ip=10.0.0.4

Validate the services as explained in OPEA on AWS document.

LLM Backend Service

  • Check logs:

    sudo docker logs tgi-service
    

    It takes ~5 minutes for this service to be ready. Wait till you see this log output:

    . . .
    2024-09-14T20:38:05.558334Z  INFO shard-manager: text_generation_launcher: Shard ready in 35.550264586s rank=0
    2024-09-14T20:38:05.639996Z  INFO text_generation_launcher: Starting Webserver
    2024-09-14T20:38:05.708611Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:90: Warming up model
    2024-09-14T20:54:53.025600Z  INFO text_generation_launcher: Cuda Graphs are disabled (CUDA_GRAPHS=None).
    2024-09-14T20:54:53.026040Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:102: Setting max batch total tokens to 80240
    2024-09-14T20:54:53.026618Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:126: Using backend V3
    2024-09-14T20:54:53.029554Z  INFO text_generation_router::server: router/src/server.rs:1651: Using the Hugging Face API
    2024-09-14T20:54:53.037101Z  INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"    
    2024-09-14T20:54:53.467570Z  INFO text_generation_router::server: router/src/server.rs:2349: Serving revision 52d98fdf3b050731f1476f1af6efc7970f14c67e of model Intel/neural-chat-7b-v3-3
    2024-09-14T20:54:53.513362Z  INFO text_generation_router::server: router/src/server.rs:1747: Overriding LlamaTokenizer with TemplateProcessing to follow python override defined in https://github.com/huggingface/transformers/blob/4aa17d00690b7f82c95bb2949ea57e22c35b4336/src/transformers/models/llama/tokenization_llama_fast.py#L203-L205
    2024-09-14T20:54:53.513655Z  INFO text_generation_router::server: router/src/server.rs:1781: Using config Some(Mistral)
    2024-09-14T20:54:53.513707Z  WARN text_generation_router::server: router/src/server.rs:1928: Invalid hostname, defaulting to 0.0.0.0
    2024-09-14T20:54:53.523637Z  INFO text_generation_router::server: router/src/server.rs:2311: Connected
    

Let's run!

RAG using hyperlink

  • Ask the question:

    curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
       "messages": "What is OPEA?"
     }'
    data: b'\n'
    
    data: b'\n'
    
    data: b'The'
    
    data: b' Oklahoma'
    
    data: b' Public'
    
    data: b' Em'
    
    data: b'ploy'
    
    data: b'ees'
    
    data: b' Association'
    
  • Update knowledge base:

    curl -X POST "http://${host_ip}:6007/v1/dataprep" \
         -H "Content-Type: multipart/form-data" \
         -F 'link_list=["https://opea.dev"]'
    

    with the answer:

    {"status":200,"message":"Data preparation succeeded"}
    
  • Ask the question:

    curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
         "messages": "What is OPEA?"
       }'
    

    with the answer:

    data: b'\n'
    
    data: b'O'
    
    data: b'PE'
    
    data: b'A'
    
    data: b' stands'
    
    data: b' for'
    
    data: b' Open'
    
    data: b' Platform'
    
    data: b' for'
    
    data: b' Enterprise'
    
    data: b' AI'
    
    data: b'.'
    
    data: b' It'
    
    image
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment