- https://portal.azure.com/
- Name:
opea-demo
- Region:
(US) West US 2
- Availability zone:
Zone 2
- Image:
Ubuntu Server 24.04 LTS - x64 Gen2
- Size:
Standard_D8s_v4
(8 vcpus, 32 GiB memory) - Key pair name:
azure-opea-demo
- Click on
Next : Disks >
- Choose OS disk size as
512 GB (p20)
- Select
Review + Create
- Once you see the message
Validation passed
, click onCreate
button - Click on
Download private key and create resource
,Go to resource
- Click on
Connect
on top left,Select
inSSH using Azure CLI
NOTE: Copying the entire command does not work, had to copy line-by-line.
# Add Docker's official GPG key:
sudo apt-get -y update
sudo apt-get -y install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
# Add the repository to Apt sources:
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get -y update
sudo apt-get -y install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
- Pull OPEA Docker images:
sudo docker pull opea/chatqna:latest sudo docker pull opea/chatqna-conversation-ui:latest
- Replace HuggingFace API token and private IP address of the host below and copy the contents in a file named
.env
:host_ip=10.0.0.4 #private IP address of the host no_proxy=${host_ip} HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5" RERANK_MODEL_ID="BAAI/bge-reranker-base" LLM_MODEL_ID="Intel/neural-chat-7b-v3-3" TEI_EMBEDDING_ENDPOINT="http://${host_ip}:6006" TEI_RERANKING_ENDPOINT="http://${host_ip}:8808" TGI_LLM_ENDPOINT="http://${host_ip}:9009" REDIS_URL="redis://${host_ip}:6379" INDEX_NAME="rag-redis" REDIS_HOST=${host_ip} MEGA_SERVICE_HOST_IP=${host_ip} EMBEDDING_SERVICE_HOST_IP=${host_ip} RETRIEVER_SERVICE_HOST_IP=${host_ip} RERANK_SERVICE_HOST_IP=${host_ip} LLM_SERVICE_HOST_IP=${host_ip} BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/chatqna" DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep" DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/get_file" DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/delete_file"
- Download Docker Compose file:
curl -O https://raw.githubusercontent.com/opea-project/GenAIExamples/main/ChatQnA/docker_compose/intel/cpu/xeon/compose.yaml
- Start the application:
sudo docker compose -f compose.yaml up -d
- Verify the list of containers:
sudo docker container ls CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES f2a6fa5ea3b7 opea/chatqna-ui:latest "docker-entrypoint.s…" 12 seconds ago Up 10 seconds 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp chatqna-xeon-ui-server c88745a81f54 opea/chatqna:latest "python chatqna.py" 12 seconds ago Up 10 seconds 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp chatqna-xeon-backend-server 00f9b2f5c296 opea/dataprep-redis:latest "python prepare_doc_…" 12 seconds ago Up 11 seconds 0.0.0.0:6007->6007/tcp, :::6007->6007/tcp dataprep-redis-server 886350aea6fc opea/llm-tgi:latest "bash entrypoint.sh" 12 seconds ago Up 11 seconds 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp llm-tgi-server 018d363ed61b opea/retriever-redis:latest "python retriever_re…" 12 seconds ago Up 11 seconds 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp retriever-redis-server 826d5ec265f3 opea/embedding-tei:latest "python embedding_te…" 12 seconds ago Up 11 seconds 0.0.0.0:6000->6000/tcp, :::6000->6000/tcp embedding-tei-server ef4e354cf4cb opea/reranking-tei:latest "python reranking_te…" 12 seconds ago Up 11 seconds 0.0.0.0:8000->8000/tcp, :::8000->8000/tcp reranking-tei-xeon-server b2af32528f92 ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu "text-generation-lau…" 12 seconds ago Up 11 seconds 0.0.0.0:9009->80/tcp, [::]:9009->80/tcp tgi-service ffd17623f9a2 redis/redis-stack:7.2.0-v9 "/entrypoint.sh" 12 seconds ago Up 11 seconds 0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp redis-vector-db 52f70df956a2 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 12 seconds ago Up 11 seconds 0.0.0.0:8808->80/tcp, [::]:8808->80/tcp tei-reranking-server 6cd64dca38c1 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 "text-embeddings-rou…" 12 seconds ago Up 11 seconds 0.0.0.0:6006->80/tcp, [::]:6006->80/tcp tei-embedding-server
Export host_ip
environment variable:
export host_ip=10.0.0.4
Validate the services as explained in OPEA on AWS document.
-
Check logs:
sudo docker logs tgi-service
It takes ~5 minutes for this service to be ready. Wait till you see this log output:
. . . 2024-09-14T20:38:05.558334Z INFO shard-manager: text_generation_launcher: Shard ready in 35.550264586s rank=0 2024-09-14T20:38:05.639996Z INFO text_generation_launcher: Starting Webserver 2024-09-14T20:38:05.708611Z INFO text_generation_router_v3: backends/v3/src/lib.rs:90: Warming up model 2024-09-14T20:54:53.025600Z INFO text_generation_launcher: Cuda Graphs are disabled (CUDA_GRAPHS=None). 2024-09-14T20:54:53.026040Z INFO text_generation_router_v3: backends/v3/src/lib.rs:102: Setting max batch total tokens to 80240 2024-09-14T20:54:53.026618Z INFO text_generation_router_v3: backends/v3/src/lib.rs:126: Using backend V3 2024-09-14T20:54:53.029554Z INFO text_generation_router::server: router/src/server.rs:1651: Using the Hugging Face API 2024-09-14T20:54:53.037101Z INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token" 2024-09-14T20:54:53.467570Z INFO text_generation_router::server: router/src/server.rs:2349: Serving revision 52d98fdf3b050731f1476f1af6efc7970f14c67e of model Intel/neural-chat-7b-v3-3 2024-09-14T20:54:53.513362Z INFO text_generation_router::server: router/src/server.rs:1747: Overriding LlamaTokenizer with TemplateProcessing to follow python override defined in https://github.com/huggingface/transformers/blob/4aa17d00690b7f82c95bb2949ea57e22c35b4336/src/transformers/models/llama/tokenization_llama_fast.py#L203-L205 2024-09-14T20:54:53.513655Z INFO text_generation_router::server: router/src/server.rs:1781: Using config Some(Mistral) 2024-09-14T20:54:53.513707Z WARN text_generation_router::server: router/src/server.rs:1928: Invalid hostname, defaulting to 0.0.0.0 2024-09-14T20:54:53.523637Z INFO text_generation_router::server: router/src/server.rs:2311: Connected
-
Ask the question:
curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{ "messages": "What is OPEA?" }' data: b'\n' data: b'\n' data: b'The' data: b' Oklahoma' data: b' Public' data: b' Em' data: b'ploy' data: b'ees' data: b' Association'
-
Update knowledge base:
curl -X POST "http://${host_ip}:6007/v1/dataprep" \ -H "Content-Type: multipart/form-data" \ -F 'link_list=["https://opea.dev"]'
with the answer:
{"status":200,"message":"Data preparation succeeded"}
-
Ask the question:
curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{ "messages": "What is OPEA?" }'
with the answer:
data: b'\n' data: b'O' data: b'PE' data: b'A' data: b' stands' data: b' for' data: b' Open' data: b' Platform' data: b' for' data: b' Enterprise' data: b' AI' data: b'.' data: b' It'