OPEA on Microsoft Azure using Docker Compose

Create your instance

Ubuntu 22.04

https://portal.azure.com/
Name: opea-demo
Region: (US) West US 2
Availability zone: Zone 2
Image: Ubuntu Server 24.04 LTS - x64 Gen2
Size: Standard_D8s_v4 (8 vcpus, 32 GiB memory)
Key pair name: azure-opea-demo
Click on Next : Disks >
Choose OS disk size as 512 GB (p20)
Select Review + Create
Once you see the message Validation passed, click on Create button
Click on Download private key and create resource, Go to resource
Click on Connect on top left, Select in SSH using Azure CLI

Install Docker:

NOTE: Copying the entire command does not work, had to copy line-by-line.

# Add Docker's official GPG key:
sudo apt-get -y update
sudo apt-get -y install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc

# Add the repository to Apt sources:
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get -y update
sudo apt-get -y install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

Docker images

Pull OPEA Docker images:

sudo docker pull opea/chatqna:latest
sudo docker pull opea/chatqna-conversation-ui:latest

Replace HuggingFace API token and private IP address of the host below and copy the contents in a file named .env:

host_ip=10.0.0.4 #private IP address of the host
no_proxy=${host_ip}
HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
RERANK_MODEL_ID="BAAI/bge-reranker-base"
LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
TEI_EMBEDDING_ENDPOINT="http://${host_ip}:6006"
TEI_RERANKING_ENDPOINT="http://${host_ip}:8808"
TGI_LLM_ENDPOINT="http://${host_ip}:9009"
REDIS_URL="redis://${host_ip}:6379"
INDEX_NAME="rag-redis"
REDIS_HOST=${host_ip}
MEGA_SERVICE_HOST_IP=${host_ip}
EMBEDDING_SERVICE_HOST_IP=${host_ip}
RETRIEVER_SERVICE_HOST_IP=${host_ip}
RERANK_SERVICE_HOST_IP=${host_ip}
LLM_SERVICE_HOST_IP=${host_ip}
BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/chatqna"
DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep"
DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/get_file"
DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/delete_file"

Download Docker Compose file:

curl -O https://raw.githubusercontent.com/opea-project/GenAIExamples/main/ChatQnA/docker_compose/intel/cpu/xeon/compose.yaml

Start the application:

sudo docker compose -f compose.yaml up -d

Verify the list of containers:

sudo docker container ls
CONTAINER ID   IMAGE                                                                 COMMAND                  CREATED          STATUS          PORTS                                                                                  NAMES
f2a6fa5ea3b7   opea/chatqna-ui:latest                                                "docker-entrypoint.s…"   12 seconds ago   Up 10 seconds   0.0.0.0:5173->5173/tcp, :::5173->5173/tcp                                              chatqna-xeon-ui-server
c88745a81f54   opea/chatqna:latest                                                   "python chatqna.py"      12 seconds ago   Up 10 seconds   0.0.0.0:8888->8888/tcp, :::8888->8888/tcp                                              chatqna-xeon-backend-server
00f9b2f5c296   opea/dataprep-redis:latest                                            "python prepare_doc_…"   12 seconds ago   Up 11 seconds   0.0.0.0:6007->6007/tcp, :::6007->6007/tcp                                              dataprep-redis-server
886350aea6fc   opea/llm-tgi:latest                                                   "bash entrypoint.sh"     12 seconds ago   Up 11 seconds   0.0.0.0:9000->9000/tcp, :::9000->9000/tcp                                              llm-tgi-server
018d363ed61b   opea/retriever-redis:latest                                           "python retriever_re…"   12 seconds ago   Up 11 seconds   0.0.0.0:7000->7000/tcp, :::7000->7000/tcp                                              retriever-redis-server
826d5ec265f3   opea/embedding-tei:latest                                             "python embedding_te…"   12 seconds ago   Up 11 seconds   0.0.0.0:6000->6000/tcp, :::6000->6000/tcp                                              embedding-tei-server
ef4e354cf4cb   opea/reranking-tei:latest                                             "python reranking_te…"   12 seconds ago   Up 11 seconds   0.0.0.0:8000->8000/tcp, :::8000->8000/tcp                                              reranking-tei-xeon-server
b2af32528f92   ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu   "text-generation-lau…"   12 seconds ago   Up 11 seconds   0.0.0.0:9009->80/tcp, [::]:9009->80/tcp                                                tgi-service
ffd17623f9a2   redis/redis-stack:7.2.0-v9                                            "/entrypoint.sh"         12 seconds ago   Up 11 seconds   0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp   redis-vector-db
52f70df956a2   ghcr.io/huggingface/text-embeddings-inference:cpu-1.5                 "text-embeddings-rou…"   12 seconds ago   Up 11 seconds   0.0.0.0:8808->80/tcp, [::]:8808->80/tcp                                                tei-reranking-server
6cd64dca38c1   ghcr.io/huggingface/text-embeddings-inference:cpu-1.5                 "text-embeddings-rou…"   12 seconds ago   Up 11 seconds   0.0.0.0:6006->80/tcp, [::]:6006->80/tcp                                                tei-embedding-server

Validate Services

Export host_ip environment variable:

export host_ip=10.0.0.4

Validate the services as explained in OPEA on AWS document.

LLM Backend Service

Check logs:

sudo docker logs tgi-service

It takes ~5 minutes for this service to be ready. Wait till you see this log output:

. . .
2024-09-14T20:38:05.558334Z  INFO shard-manager: text_generation_launcher: Shard ready in 35.550264586s rank=0
2024-09-14T20:38:05.639996Z  INFO text_generation_launcher: Starting Webserver
2024-09-14T20:38:05.708611Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:90: Warming up model
2024-09-14T20:54:53.025600Z  INFO text_generation_launcher: Cuda Graphs are disabled (CUDA_GRAPHS=None).
2024-09-14T20:54:53.026040Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:102: Setting max batch total tokens to 80240
2024-09-14T20:54:53.026618Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:126: Using backend V3
2024-09-14T20:54:53.029554Z  INFO text_generation_router::server: router/src/server.rs:1651: Using the Hugging Face API
2024-09-14T20:54:53.037101Z  INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"    
2024-09-14T20:54:53.467570Z  INFO text_generation_router::server: router/src/server.rs:2349: Serving revision 52d98fdf3b050731f1476f1af6efc7970f14c67e of model Intel/neural-chat-7b-v3-3
2024-09-14T20:54:53.513362Z  INFO text_generation_router::server: router/src/server.rs:1747: Overriding LlamaTokenizer with TemplateProcessing to follow python override defined in https://github.com/huggingface/transformers/blob/4aa17d00690b7f82c95bb2949ea57e22c35b4336/src/transformers/models/llama/tokenization_llama_fast.py#L203-L205
2024-09-14T20:54:53.513655Z  INFO text_generation_router::server: router/src/server.rs:1781: Using config Some(Mistral)
2024-09-14T20:54:53.513707Z  WARN text_generation_router::server: router/src/server.rs:1928: Invalid hostname, defaulting to 0.0.0.0
2024-09-14T20:54:53.523637Z  INFO text_generation_router::server: router/src/server.rs:2311: Connected

Let's run!

RAG using hyperlink

Ask the question:

curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
   "messages": "What is OPEA?"
 }'
data: b'\n'

data: b'\n'

data: b'The'

data: b' Oklahoma'

data: b' Public'

data: b' Em'

data: b'ploy'

data: b'ees'

data: b' Association'

Update knowledge base:

curl -X POST "http://${host_ip}:6007/v1/dataprep" \
     -H "Content-Type: multipart/form-data" \
     -F 'link_list=["https://opea.dev"]'

with the answer:

{"status":200,"message":"Data preparation succeeded"}