Skip to content

Instantly share code, notes, and snippets.

@arun-gupta
Last active September 14, 2024 21:22
Show Gist options
  • Save arun-gupta/564c5334c62cf4ada3cbd3124a2defb7 to your computer and use it in GitHub Desktop.
Save arun-gupta/564c5334c62cf4ada3cbd3124a2defb7 to your computer and use it in GitHub Desktop.
OPEA on Google Cloud using Docker Compose

OPEA on Google Cloud using Docker Compose

Create your instance

Ubuntu 22.04

  • https://console.cloud.google.com/

  • c3-standard-8

  • Change boot disk to 500 GB

  • Ubuntu 24.04 LTS

  • Install Docker:

    # Add Docker's official GPG key:
    sudo apt-get -y update
    sudo apt-get -y install ca-certificates curl
    sudo install -m 0755 -d /etc/apt/keyrings
    sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
    sudo chmod a+r /etc/apt/keyrings/docker.asc
    
    # Add the repository to Apt sources:
    echo \
      "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
      $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
      sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
    sudo apt-get -y update
    sudo apt-get -y install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
    

Docker images

  • Pull OPEA Docker images:
    sudo docker pull opea/chatqna:latest
    sudo docker pull opea/chatqna-conversation-ui:latest
    
  • Replace HuggingFace API token and private IP address of the host below and copy the contents in a file named .env:
    host_ip=10.128.0.3 #private IP address of the host
    no_proxy=${host_ip}
    HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
    EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
    RERANK_MODEL_ID="BAAI/bge-reranker-base"
    LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
    TEI_EMBEDDING_ENDPOINT="http://${host_ip}:6006"
    TEI_RERANKING_ENDPOINT="http://${host_ip}:8808"
    TGI_LLM_ENDPOINT="http://${host_ip}:9009"
    REDIS_URL="redis://${host_ip}:6379"
    INDEX_NAME="rag-redis"
    REDIS_HOST=${host_ip}
    MEGA_SERVICE_HOST_IP=${host_ip}
    EMBEDDING_SERVICE_HOST_IP=${host_ip}
    RETRIEVER_SERVICE_HOST_IP=${host_ip}
    RERANK_SERVICE_HOST_IP=${host_ip}
    LLM_SERVICE_HOST_IP=${host_ip}
    BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/chatqna"
    DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep"
    DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/get_file"
    DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/delete_file"
    
  • Download Docker Compose file:
    curl -O https://raw.githubusercontent.com/opea-project/GenAIExamples/main/ChatQnA/docker_compose/intel/cpu/xeon/compose.yaml
    
  • Start the application:
    sudo docker compose -f compose.yaml up -d
    
  • Verify the list of containers:
    arun_gupta@opea-demo:~$ sudo docker container ls
    CONTAINER ID   IMAGE                                                                 COMMAND                  CREATED          STATUS         PORTS                                                                                  NAMES
    65b54e433cfe   opea/chatqna-ui:latest                                                "docker-entrypoint.s…"   7 seconds ago    Up 6 seconds   0.0.0.0:5173->5173/tcp, :::5173->5173/tcp                                              chatqna-xeon-ui-server
    798310e0ca77   opea/chatqna:latest                                                   "python chatqna.py"      7 seconds ago    Up 6 seconds   0.0.0.0:8888->8888/tcp, :::8888->8888/tcp                                              chatqna-xeon-backend-server
    362f3117a528   opea/dataprep-redis:latest                                            "python prepare_doc_…"   7 seconds ago    Up 6 seconds   0.0.0.0:6007->6007/tcp, :::6007->6007/tcp                                              dataprep-redis-server
    3985a4de5dc4   opea/embedding-tei:latest                                             "python embedding_te…"   7 seconds ago    Up 6 seconds   0.0.0.0:6000->6000/tcp, :::6000->6000/tcp                                              embedding-tei-server
    b41907df6672   opea/reranking-tei:latest                                             "python reranking_te…"   7 seconds ago    Up 6 seconds   0.0.0.0:8000->8000/tcp, :::8000->8000/tcp                                              reranking-tei-xeon-server
    19d9a30f85de   opea/llm-tgi:latest                                                   "bash entrypoint.sh"     7 seconds ago    Up 6 seconds   0.0.0.0:9000->9000/tcp, :::9000->9000/tcp                                              llm-tgi-server
    3fa19c8ec722   opea/retriever-redis:latest                                           "python retriever_re…"   7 seconds ago    Up 6 seconds   0.0.0.0:7000->7000/tcp, :::7000->7000/tcp                                              retriever-redis-server
    14b5ccd5416c   ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu   "text-generation-lau…"   19 seconds ago   Up 7 seconds   0.0.0.0:9009->80/tcp, [::]:9009->80/tcp                                                tgi-service
    8f58f9aaefae   redis/redis-stack:7.2.0-v9                                            "/entrypoint.sh"         19 seconds ago   Up 7 seconds   0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp   redis-vector-db
    931126a552cb   ghcr.io/huggingface/text-embeddings-inference:cpu-1.5                 "text-embeddings-rou…"   19 seconds ago   Up 7 seconds   0.0.0.0:6006->80/tcp, [::]:6006->80/tcp                                                tei-embedding-server
    5a2c435edc0f   ghcr.io/huggingface/text-embeddings-inference:cpu-1.5                 "text-embeddings-rou…"   19 seconds ago   Up 7 seconds   0.0.0.0:8808->80/tcp, [::]:8808->80/tcp                                                tei-reranking-server  
    

Validate Services

Export host_ip environment variable:

export host_ip=10.128.0.3

Embedding service

Test:

curl ${host_ip}:6006/embed \
  -X POST \
  -d '{"inputs":"What is Deep Learning?"}' \
  -H 'Content-Type: application/json'

Answer:

[[0.00037115702,-0.06356819,0.0024758505,-0.012360337,0.050739925,0.023380278,0.022216318,0.0008076447,
  . . . 
0.022558564,-0.04570635,-0.033072025,0.022725677,0.016026087,-0.02125421,-0.02984927,-0.0049473033]]

Embedding microservice

Test:

curl http://${host_ip}:6000/v1/embeddings\
  -X POST \
  -d '{"text":"hello"}' \
  -H 'Content-Type: application/json'

Failing with

Answer:

{"id":"2d6bbb69f440491249e672d6039dfd5f","text":"hello","embedding":[0.0007791813,0.042613804
. . .
-0.0044034636],"search_type":"similarity","k":4,"distance_threshold":null,"fetch_k":20,"lambda_mult":0.5,"score_threshold":0.2}

Retriever microservice

Test:

export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)")
curl http://${host_ip}:7000/v1/retrieval \
  -X POST \
  -d "{\"text\":\"test\",\"embedding\":${your_embedding}}" \
  -H 'Content-Type: application/json'

Answer:

{"id":"429f21479c99008b1ade5d8720cc60dc","retrieved_docs":[],"initial_query":"test","top_n":1}

TEI Reranking service

Test:

curl http://${host_ip}:8808/rerank \
    -X POST \
    -d '{"query":"What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \
    -H 'Content-Type: application/json'

Answer:

[{"index":1,"score":0.94238955},{"index":0,"score":0.120219156}]

Reranking microservice

Test:

curl http://${host_ip}:8000/v1/reranking\
  -X POST \
  -d '{"initial_query":"What is Deep Learning?", "retrieved_docs": [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \
  -H 'Content-Type: application/json'

Answer:

{"id":"65a489a9fae807039905008dce80ef6b","model":null,"query":"What is Deep Learning?","max_new_tokens":1024,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true,"chat_template":null,"documents":["Deep learning is..."]}

LLM Backend Service

  • Check logs:

    sudo docker logs tgi-service
    

    It takes ~5 minutes for this service to be ready. Wait till you see this log output:

    . . .
    2024-09-12T02:14:07.324250Z  INFO shard-manager: text_generation_launcher: Shard ready in 20.620625696s rank=0
    2024-09-12T02:14:07.380398Z  INFO text_generation_launcher: Starting Webserver
    2024-09-12T02:14:07.526375Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:90: Warming up model
    2024-09-12T02:14:42.046106Z  INFO text_generation_launcher: Cuda Graphs are disabled (CUDA_GRAPHS=None).
    2024-09-12T02:14:42.046591Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:102: Setting max batch total tokens to 83088
    2024-09-12T02:14:42.047332Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:126: Using backend V3
    2024-09-12T02:14:42.051963Z  INFO text_generation_router::server: router/src/server.rs:1651: Using the Hugging Face API
    2024-09-12T02:14:42.066054Z  INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"    
    2024-09-12T02:14:42.473427Z  INFO text_generation_router::server: router/src/server.rs:2349: Serving revision 52d98fdf3b050731f1476f1af6efc7970f14c67e of model Intel/neural-chat-7b-v3-3
    2024-09-12T02:14:42.516228Z  INFO text_generation_router::server: router/src/server.rs:1747: Overriding LlamaTokenizer with TemplateProcessing to follow python override defined in https://github.com/huggingface/transformers/blob/4aa17d00690b7f82c95bb2949ea57e22c35b4336/src/transformers/models/llama/tokenization_llama_fast.py#L203-L205
    2024-09-12T02:14:42.516696Z  INFO text_generation_router::server: router/src/server.rs:1781: Using config Some(Mistral)
    2024-09-12T02:14:42.516736Z  WARN text_generation_router::server: router/src/server.rs:1928: Invalid hostname, defaulting to 0.0.0.0
    2024-09-12T02:14:42.528179Z  INFO text_generation_router::server: router/src/server.rs:2311: Connected
    
  • Check TGI service:

    # TGI service
    curl http://${host_ip}:9009/generate \
      -X POST \
      -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \
      -H 'Content-Type: application/json'
    

    with the response:

    {"generated_text":"\n\nDeep Learning is a subset of machine learning which focuses on algorithms that learn from"}
    
  • Check vLLM service:

    curl http://${host_ip}:9009/v1/completions \
      -H "Content-Type: application/json" \
      -d '{"model": "Intel/neural-chat-7b-v3-3", "prompt": "What is Deep Learning?", "max_tokens": 32, "temperature": 0}'
    

    with the response:

    {"object":"text_completion","id":"","created":1726117774,"model":"Intel/neural-chat-7b-v3-3","system_fingerprint":"2.2.1-dev0-sha-e4201f4-intel-cpu","choices":[{"index":0,"text":"\n\nDeep Learning is a subset of Machine Learning that is concerned with algorithms inspired by the structure and function of the brain. It is a part of Artificial","logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":6,"completion_tokens":32,"total_tokens":38}}
    

LLM microservice

Test:

curl http://${host_ip}:9000/v1/chat/completions\
  -X POST \
  -d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
  -H 'Content-Type: application/json'

Answer:

data: b'\n'

data: b'\n'

data: b'Deep'

data: b' learning'

data: b' is'

data: b' a'

data: b' subset'

data: b' of'

data: b' machine'

data: b' learning'

data: b' that'

data: b' uses'

data: b' algorithms'

data: b' to'

data: b' learn'

data: b' from'

data: b' data'

data: [DONE]

Megaservice

Test:

curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
     "messages": "What is the revenue of Nike in 2023?"
   }'

Answer:

data: b'\n'

data: b'\n'

data: b'N'

data: b'ike'

data: b"'"

data: b's'

data: b' revenue'

data: b' for'

. . .

data: b' popularity'

data: b' among'

data: b' consumers'

data: b'.'

data: b'</s>'

data: [DONE]

Let's run!

RAG using hyperlink

  • Ask the question:

    curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
       "messages": "What is OPEA?"
     }'
    data: b'\n'
    
    data: b'\n'
    
    data: b'The'
    
    data: b' Oklahoma'
    
    data: b' Public'
    
    data: b' Em'
    
    data: b'ploy'
    
    data: b'ees'
    
    data: b' Association'
    
  • Update knowledge base:

    curl -X POST "http://${host_ip}:6007/v1/dataprep" \
         -H "Content-Type: multipart/form-data" \
         -F 'link_list=["https://opea.dev"]'
    {"status":200,"message":"Data preparation succeeded"}
    {"status":200,"message":"Data preparation succeeded"}status:200: command not found
    
  • Ask the question:

    arun_gupta@opea-demo:~$ curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
         "messages": "What is OPEA?"
       }'
    data: b'\n'
    
    data: b'O'
    
    data: b'PE'
    
    data: b'A'
    
    data: b' stands'
    
    data: b' for'
    
    data: b' Open'
    
    data: b' Platform'
    
    data: b' for'
    
    data: b' Enterprise'
    
    data: b' AI'
    
    data: b'.'
    
    data: b' It'
    
  • Delete link from the knowledge base:

    [ec2-user@ip-172-31-77-194 ~]$ # delete link
    curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \
         -d '{"file_path": "https://opea.dev"}' \
         -H "Content-Type: application/json"
    {"detail":"File https://opea.dev not found. Please check file_path."}
    

    This is giving an error: opea-project/GenAIExamples#724

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment