OPEA on Google Cloud using Docker Compose

Create your instance

Ubuntu 22.04

https://console.cloud.google.com/
c3-standard-8
Change boot disk to 500 GB
Ubuntu 24.04 LTS

Install Docker:

# Add Docker's official GPG key:
sudo apt-get -y update
sudo apt-get -y install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc

# Add the repository to Apt sources:
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get -y update
sudo apt-get -y install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

Docker images

Pull OPEA Docker images:

sudo docker pull opea/chatqna:latest
sudo docker pull opea/chatqna-conversation-ui:latest

Replace HuggingFace API token and private IP address of the host below and copy the contents in a file named .env:

host_ip=10.128.0.3 #private IP address of the host
no_proxy=${host_ip}
HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
RERANK_MODEL_ID="BAAI/bge-reranker-base"
LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
TEI_EMBEDDING_ENDPOINT="http://${host_ip}:6006"
TEI_RERANKING_ENDPOINT="http://${host_ip}:8808"
TGI_LLM_ENDPOINT="http://${host_ip}:9009"
REDIS_URL="redis://${host_ip}:6379"
INDEX_NAME="rag-redis"
REDIS_HOST=${host_ip}
MEGA_SERVICE_HOST_IP=${host_ip}
EMBEDDING_SERVICE_HOST_IP=${host_ip}
RETRIEVER_SERVICE_HOST_IP=${host_ip}
RERANK_SERVICE_HOST_IP=${host_ip}
LLM_SERVICE_HOST_IP=${host_ip}
BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/chatqna"
DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep"
DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/get_file"
DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/delete_file"

Download Docker Compose file:

curl -O https://raw.githubusercontent.com/opea-project/GenAIExamples/main/ChatQnA/docker_compose/intel/cpu/xeon/compose.yaml

Start the application:

sudo docker compose -f compose.yaml up -d

Verify the list of containers:

arun_gupta@opea-demo:~$ sudo docker container ls
CONTAINER ID   IMAGE                                                                 COMMAND                  CREATED          STATUS         PORTS                                                                                  NAMES
65b54e433cfe   opea/chatqna-ui:latest                                                "docker-entrypoint.s…"   7 seconds ago    Up 6 seconds   0.0.0.0:5173->5173/tcp, :::5173->5173/tcp                                              chatqna-xeon-ui-server
798310e0ca77   opea/chatqna:latest                                                   "python chatqna.py"      7 seconds ago    Up 6 seconds   0.0.0.0:8888->8888/tcp, :::8888->8888/tcp                                              chatqna-xeon-backend-server
362f3117a528   opea/dataprep-redis:latest                                            "python prepare_doc_…"   7 seconds ago    Up 6 seconds   0.0.0.0:6007->6007/tcp, :::6007->6007/tcp                                              dataprep-redis-server
3985a4de5dc4   opea/embedding-tei:latest                                             "python embedding_te…"   7 seconds ago    Up 6 seconds   0.0.0.0:6000->6000/tcp, :::6000->6000/tcp                                              embedding-tei-server
b41907df6672   opea/reranking-tei:latest                                             "python reranking_te…"   7 seconds ago    Up 6 seconds   0.0.0.0:8000->8000/tcp, :::8000->8000/tcp                                              reranking-tei-xeon-server
19d9a30f85de   opea/llm-tgi:latest                                                   "bash entrypoint.sh"     7 seconds ago    Up 6 seconds   0.0.0.0:9000->9000/tcp, :::9000->9000/tcp                                              llm-tgi-server
3fa19c8ec722   opea/retriever-redis:latest                                           "python retriever_re…"   7 seconds ago    Up 6 seconds   0.0.0.0:7000->7000/tcp, :::7000->7000/tcp                                              retriever-redis-server
14b5ccd5416c   ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu   "text-generation-lau…"   19 seconds ago   Up 7 seconds   0.0.0.0:9009->80/tcp, [::]:9009->80/tcp                                                tgi-service
8f58f9aaefae   redis/redis-stack:7.2.0-v9                                            "/entrypoint.sh"         19 seconds ago   Up 7 seconds   0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp   redis-vector-db
931126a552cb   ghcr.io/huggingface/text-embeddings-inference:cpu-1.5                 "text-embeddings-rou…"   19 seconds ago   Up 7 seconds   0.0.0.0:6006->80/tcp, [::]:6006->80/tcp                                                tei-embedding-server
5a2c435edc0f   ghcr.io/huggingface/text-embeddings-inference:cpu-1.5                 "text-embeddings-rou…"   19 seconds ago   Up 7 seconds   0.0.0.0:8808->80/tcp, [::]:8808->80/tcp                                                tei-reranking-server

Validate Services

Export host_ip environment variable:

export host_ip=10.128.0.3

Embedding service

Test:

curl ${host_ip}:6006/embed \
  -X POST \
  -d '{"inputs":"What is Deep Learning?"}' \
  -H 'Content-Type: application/json'

Answer:

[[0.00037115702,-0.06356819,0.0024758505,-0.012360337,0.050739925,0.023380278,0.022216318,0.0008076447,
  . . . 
0.022558564,-0.04570635,-0.033072025,0.022725677,0.016026087,-0.02125421,-0.02984927,-0.0049473033]]

Embedding microservice

Test:

curl http://${host_ip}:6000/v1/embeddings\
  -X POST \
  -d '{"text":"hello"}' \
  -H 'Content-Type: application/json'

Failing with

Answer:

{"id":"2d6bbb69f440491249e672d6039dfd5f","text":"hello","embedding":[0.0007791813,0.042613804
. . .
-0.0044034636],"search_type":"similarity","k":4,"distance_threshold":null,"fetch_k":20,"lambda_mult":0.5,"score_threshold":0.2}

Retriever microservice

Test:

export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)")
curl http://${host_ip}:7000/v1/retrieval \
  -X POST \
  -d "{\"text\":\"test\",\"embedding\":${your_embedding}}" \
  -H 'Content-Type: application/json'

Answer:

{"id":"429f21479c99008b1ade5d8720cc60dc","retrieved_docs":[],"initial_query":"test","top_n":1}

TEI Reranking service

Test:

curl http://${host_ip}:8808/rerank \
    -X POST \
    -d '{"query":"What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \
    -H 'Content-Type: application/json'

Answer:

[{"index":1,"score":0.94238955},{"index":0,"score":0.120219156}]

Reranking microservice

Test:

curl http://${host_ip}:8000/v1/reranking\
  -X POST \
  -d '{"initial_query":"What is Deep Learning?", "retrieved_docs": [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \
  -H 'Content-Type: application/json'

Answer:

{"id":"65a489a9fae807039905008dce80ef6b","model":null,"query":"What is Deep Learning?","max_new_tokens":1024,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true,"chat_template":null,"documents":["Deep learning is..."]}

LLM Backend Service

Check logs:

sudo docker logs tgi-service

It takes ~5 minutes for this service to be ready. Wait till you see this log output:

. . .
2024-09-12T02:14:07.324250Z  INFO shard-manager: text_generation_launcher: Shard ready in 20.620625696s rank=0
2024-09-12T02:14:07.380398Z  INFO text_generation_launcher: Starting Webserver
2024-09-12T02:14:07.526375Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:90: Warming up model
2024-09-12T02:14:42.046106Z  INFO text_generation_launcher: Cuda Graphs are disabled (CUDA_GRAPHS=None).
2024-09-12T02:14:42.046591Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:102: Setting max batch total tokens to 83088
2024-09-12T02:14:42.047332Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:126: Using backend V3
2024-09-12T02:14:42.051963Z  INFO text_generation_router::server: router/src/server.rs:1651: Using the Hugging Face API
2024-09-12T02:14:42.066054Z  INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"    
2024-09-12T02:14:42.473427Z  INFO text_generation_router::server: router/src/server.rs:2349: Serving revision 52d98fdf3b050731f1476f1af6efc7970f14c67e of model Intel/neural-chat-7b-v3-3
2024-09-12T02:14:42.516228Z  INFO text_generation_router::server: router/src/server.rs:1747: Overriding LlamaTokenizer with TemplateProcessing to follow python override defined in https://github.com/huggingface/transformers/blob/4aa17d00690b7f82c95bb2949ea57e22c35b4336/src/transformers/models/llama/tokenization_llama_fast.py#L203-L205
2024-09-12T02:14:42.516696Z  INFO text_generation_router::server: router/src/server.rs:1781: Using config Some(Mistral)
2024-09-12T02:14:42.516736Z  WARN text_generation_router::server: router/src/server.rs:1928: Invalid hostname, defaulting to 0.0.0.0
2024-09-12T02:14:42.528179Z  INFO text_generation_router::server: router/src/server.rs:2311: Connected

Check TGI service:

# TGI service
curl http://${host_ip}:9009/generate \
  -X POST \
  -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \
  -H 'Content-Type: application/json'

with the response:

{"generated_text":"\n\nDeep Learning is a subset of machine learning which focuses on algorithms that learn from"}

Check vLLM service:

curl http://${host_ip}:9009/v1/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "Intel/neural-chat-7b-v3-3", "prompt": "What is Deep Learning?", "max_tokens": 32, "temperature": 0}'

with the response:

{"object":"text_completion","id":"","created":1726117774,"model":"Intel/neural-chat-7b-v3-3","system_fingerprint":"2.2.1-dev0-sha-e4201f4-intel-cpu","choices":[{"index":0,"text":"\n\nDeep Learning is a subset of Machine Learning that is concerned with algorithms inspired by the structure and function of the brain. It is a part of Artificial","logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":6,"completion_tokens":32,"total_tokens":38}}

LLM microservice

Test:

curl http://${host_ip}:9000/v1/chat/completions\
  -X POST \
  -d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
  -H 'Content-Type: application/json'

Answer:

data: b'\n'

data: b'\n'

data: b'Deep'

data: b' learning'

data: b' is'

data: b' a'

data: b' subset'

data: b' of'

data: b' machine'

data: b' learning'

data: b' that'

data: b' uses'

data: b' algorithms'

data: b' to'

data: b' learn'

data: b' from'

data: b' data'

data: [DONE]

Megaservice

Test:

curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
     "messages": "What is the revenue of Nike in 2023?"
   }'

Answer:

data: b'\n'

data: b'\n'

data: b'N'

data: b'ike'

data: b"'"

data: b's'

data: b' revenue'

data: b' for'

. . .

data: b' popularity'

data: b' among'

data: b' consumers'

data: b'.'

data: b'</s>'

data: [DONE]

Let's run!

RAG using hyperlink

Ask the question:

curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
   "messages": "What is OPEA?"
 }'
data: b'\n'

data: b'\n'

data: b'The'

data: b' Oklahoma'

data: b' Public'

data: b' Em'

data: b'ploy'

data: b'ees'

data: b' Association'

Update knowledge base:

curl -X POST "http://${host_ip}:6007/v1/dataprep" \
     -H "Content-Type: multipart/form-data" \
     -F 'link_list=["https://opea.dev"]'
{"status":200,"message":"Data preparation succeeded"}
{"status":200,"message":"Data preparation succeeded"}status:200: command not found

Ask the question:

arun_gupta@opea-demo:~$ curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
     "messages": "What is OPEA?"
   }'
data: b'\n'

data: b'O'

data: b'PE'

data: b'A'

data: b' stands'

data: b' for'

data: b' Open'

data: b' Platform'

data: b' for'

data: b' Enterprise'

data: b' AI'

data: b'.'

data: b' It'

Delete link from the knowledge base:

[ec2-user@ip-172-31-77-194 ~]$ # delete link
curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \
     -d '{"file_path": "https://opea.dev"}' \
     -H "Content-Type: application/json"
{"detail":"File https://opea.dev not found. Please check file_path."}

This is giving an error: opea-project/GenAIExamples#724

arun-gupta/readme.md

Select an option

No results found

Select an option

No results found

OPEA on Google Cloud using Docker Compose

Create your instance

Ubuntu 22.04

Docker images

Validate Services

Embedding service

Embedding microservice

Retriever microservice

TEI Reranking service

Reranking microservice

LLM Backend Service

LLM microservice

Megaservice

Let's run!

RAG using hyperlink