Skip to content

Instantly share code, notes, and snippets.

@arun-gupta
Last active September 18, 2024 04:10
Show Gist options
  • Save arun-gupta/7e9f080feff664fbab878b26d13d83d7 to your computer and use it in GitHub Desktop.
Save arun-gupta/7e9f080feff664fbab878b26d13d83d7 to your computer and use it in GitHub Desktop.
OPEA on AWS using Docker Compose

OPEA on AWS using Docker Compose

Pick your AMI

Ubuntu

  • Launch an Ubuntu 24.04, m7i.4xlarge instance (16 vCPU, 64 GB memory). Change storage to 500GB.

  • Install Docker:

    # Add Docker's official GPG key:
    sudo apt-get -y update
    sudo apt-get -y install ca-certificates curl
    sudo install -m 0755 -d /etc/apt/keyrings
    sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
    sudo chmod a+r /etc/apt/keyrings/docker.asc
    
    # Add the repository to Apt sources:
    echo \
      "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
      $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
      sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
    sudo apt-get -y update
    sudo apt-get -y install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
    

Amazon Linux 2023

  • Using AWS Console, Launch an Amazon Linux 2023, m7i.4xlarge instance (16 vCPU, 64 GB memory). Change storage to 500GB.
  • Install Docker:
    sudo yum update -y
    sudo yum install -y docker
    sudo service docker start
    #sudo usermod -a -G docker ec2-user
    
  • Install Docker Compose:
    sudo curl -L https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m) -o /usr/local/bin/docker-compose
    sudo chmod +x /usr/local/bin/docker-compose
    

Docker images

  • Pull OPEA Docker images:
    sudo docker pull opea/chatqna:latest
    sudo docker pull opea/chatqna-conversation-ui:latest
    
  • Replace HuggingFace API token and private IP address of the host below and copy the contents in a file named .env:
    host_ip=172.31.37.13 #private IP address of the host
    no_proxy=${host_ip}
    HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
    EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
    RERANK_MODEL_ID="BAAI/bge-reranker-base"
    LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
    TEI_EMBEDDING_ENDPOINT="http://${host_ip}:6006"
    TEI_RERANKING_ENDPOINT="http://${host_ip}:8808"
    TGI_LLM_ENDPOINT="http://${host_ip}:9009"
    REDIS_URL="redis://${host_ip}:6379"
    INDEX_NAME="rag-redis"
    REDIS_HOST=${host_ip}
    MEGA_SERVICE_HOST_IP=${host_ip}
    EMBEDDING_SERVICE_HOST_IP=${host_ip}
    RETRIEVER_SERVICE_HOST_IP=${host_ip}
    RERANK_SERVICE_HOST_IP=${host_ip}
    LLM_SERVICE_HOST_IP=${host_ip}
    BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/chatqna"
    DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep"
    DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/get_file"
    DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/delete_file"
    
  • Download Docker Compose file:
    curl -O https://raw.githubusercontent.com/opea-project/GenAIExamples/main/ChatQnA/docker_compose/intel/cpu/xeon/compose.yaml
    
  • Start the application:
    sudo docker compose -f compose.yaml up -d
    
  • Verify the list of containers:
    ubuntu@ip-172-31-79-111:~$ sudo docker container ls
    CONTAINER ID   IMAGE                                                                 COMMAND                  CREATED         STATUS         PORTS                                                                                  NAMES
    29f3a466d175   opea/chatqna-ui:latest                                                "docker-entrypoint.s…"   4 minutes ago   Up 4 minutes   0.0.0.0:5173->5173/tcp, :::5173->5173/tcp                                              chatqna-xeon-ui-server
    1020fa2a75c2   opea/chatqna:latest                                                   "python chatqna.py"      4 minutes ago   Up 4 minutes   0.0.0.0:8888->8888/tcp, :::8888->8888/tcp                                              chatqna-xeon-backend-server
    02112b28ee54   opea/dataprep-redis:latest                                            "python prepare_doc_…"   4 minutes ago   Up 4 minutes   0.0.0.0:6007->6007/tcp, :::6007->6007/tcp                                              dataprep-redis-server
    94aaec2991d6   opea/retriever-redis:latest                                           "python retriever_re…"   4 minutes ago   Up 4 minutes   0.0.0.0:7000->7000/tcp, :::7000->7000/tcp                                              retriever-redis-server
    9fb6744ceb24   opea/llm-tgi:latest                                                   "bash entrypoint.sh"     4 minutes ago   Up 4 minutes   0.0.0.0:9000->9000/tcp, :::9000->9000/tcp                                              llm-tgi-server
    27576d976a3d   opea/embedding-tei:latest                                             "python embedding_te…"   4 minutes ago   Up 4 minutes   0.0.0.0:6000->6000/tcp, :::6000->6000/tcp                                              embedding-tei-server
    3e04371fd54b   opea/reranking-tei:latest                                             "python reranking_te…"   4 minutes ago   Up 4 minutes   0.0.0.0:8000->8000/tcp, :::8000->8000/tcp                                              reranking-tei-xeon-server
    62929403a9ed   ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu   "text-generation-lau…"   5 minutes ago   Up 4 minutes   0.0.0.0:9009->80/tcp, [::]:9009->80/tcp                                                tgi-service
    50208c6bc36c   redis/redis-stack:7.2.0-v9                                            "/entrypoint.sh"         5 minutes ago   Up 4 minutes   0.0.0.0:6379->6379/tcp, :::6379->6379/tcp, 0.0.0.0:8001->8001/tcp, :::8001->8001/tcp   redis-vector-db
    2a4158c2dbc8   ghcr.io/huggingface/text-embeddings-inference:cpu-1.5                 "text-embeddings-rou…"   5 minutes ago   Up 4 minutes   0.0.0.0:8808->80/tcp, [::]:8808->80/tcp                                                tei-reranking-server
    47a59e0d52de   ghcr.io/huggingface/text-embeddings-inference:cpu-1.5                 "text-embeddings-rou…"   5 minutes ago   Up 4 minutes   0.0.0.0:6006->80/tcp, [::]:6006->80/tcp                                                tei-embedding-server
    

Validate Services

Export host_ip environment variable:

export host_ip=172.31.37.13

Embedding service

Test:

curl ${host_ip}:6006/embed \
  -X POST \
  -d '{"inputs":"What is Deep Learning?"}' \
  -H 'Content-Type: application/json'

Answer:

[[0.00037115702,-0.06356819,0.0024758505,-0.012360337,0.050739925,0.023380278,0.022216318,0.0008076447,-0.0003412891,
  . . . 
-0.0067949123,0.022558564,-0.04570635,-0.033072025,0.022725677,0.016026087,-0.02125421,-0.02984927,-0.0049473033]]

Embedding microservice

Test:

curl http://${host_ip}:6000/v1/embeddings\
  -X POST \
  -d '{"text":"hello"}' \
  -H 'Content-Type: application/json'

Answer:

{"id":"b73c50b8a8b535c3af708ebc16b0d9cd","text":"hello","embedding":[0.0007791813,0.042613804,0.020304274,-0.0070378557,0.00020632005,0.020170836,-0.00021343566,0.04560513,-0.04856186,-0.0681003
. . .
027401684,-0.052007433,0.016100302,0.059366036,-0.0044034636],"search_type":"similarity","k":4,"distance_threshold":null,"fetch_k":20,"lambda_mult":0.5,"score_threshold":0.2}

Retriever microservice

Test:

export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)")
curl http://${host_ip}:7000/v1/retrieval \
  -X POST \
  -d "{\"text\":\"test\",\"embedding\":${your_embedding}}" \
  -H 'Content-Type: application/json'

Answer:

{"id":"4e0eb0f1ac507c4fbd8f4c843f705f78","retrieved_docs":[],"initial_query":"test","top_n":1}

TEI Reranking service

Test:

curl http://${host_ip}:8808/rerank \
    -X POST \
    -d '{"query":"What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \
    -H 'Content-Type: application/json'

Answer:

[{"index":1,"score":0.94238955},{"index":0,"score":0.120219156}]

Reranking microservice

Test:

curl http://${host_ip}:8000/v1/reranking\
  -X POST \
  -d '{"initial_query":"What is Deep Learning?", "retrieved_docs": [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \
  -H 'Content-Type: application/json'

Answer:

{"id":"41d35cb9153ceb018d62fd1271194aa5","model":null,"query":"What is Deep Learning?","max_new_tokens":1024,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true,"chat_template":null,"documents":["Deep learning is..."]}

LLM Backend Service

  • Check logs:

    sudo docker logs tgi-service
    

    It takes ~5 minutes for this service to be ready. Wait till you see this log output:

    . . .
    2024-09-03T17:28:42.909843Z  INFO text_generation_launcher: Downloaded /data/models--Intel--neural-chat-7b-v3-3/snapshots/bdd31cf498d13782cc7497cba5896996ce429f91/pytorch_model-00002-of-00002.bin in 0:00:25.
    2024-09-03T17:28:42.909864Z  INFO text_generation_launcher: Download: [2/2] -- ETA: 0
    2024-09-03T17:28:42.909880Z  WARN text_generation_launcher: 🚨🚨BREAKING CHANGE in 2.0🚨🚨: Safetensors conversion is disabled without `--trust-remote-code` because Pickle files are unsafe and can essentially contain remote code execution!Please check for more information here: https://huggingface.co/docs/text-generation-inference/basic_tutorials/safety
    2024-09-03T17:28:42.910155Z  WARN text_generation_launcher: No safetensors weights found for model Intel/neural-chat-7b-v3-3 at revision None. Converting PyTorch weights to safetensors.
    2024-09-03T17:30:26.694416Z  INFO text_generation_launcher: Convert: [1/2] -- Took: 0:01:43.759912
    2024-09-03T17:30:58.726409Z  INFO text_generation_launcher: Convert: [2/2] -- Took: 0:00:32.031806
    2024-09-03T17:30:59.727506Z  INFO download: text_generation_launcher: Successfully downloaded weights for Intel/neural-chat-7b-v3-3
    2024-09-03T17:30:59.727942Z  INFO shard-manager: text_generation_launcher: Starting shard rank=0
    2024-09-03T17:31:03.179128Z  WARN text_generation_launcher: FBGEMM fp8 kernels are not installed.
    2024-09-03T17:31:03.196988Z  INFO text_generation_launcher: Using Attention = False
    2024-09-03T17:31:03.197034Z  INFO text_generation_launcher: Using Attention = paged
    2024-09-03T17:31:03.251121Z  WARN text_generation_launcher: Could not import Mamba: No module named 'mamba_ssm'
    2024-09-03T17:31:03.410013Z  INFO text_generation_launcher: affinity={0, 1, 2, 3, 4, 5, 6, 7}, membind = {0}
    2024-09-03T17:31:06.539109Z  INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-0
    2024-09-03T17:31:06.584728Z  INFO shard-manager: text_generation_launcher: Shard ready in 6.806910922s rank=0
    2024-09-03T17:31:06.633462Z  INFO text_generation_launcher: Starting Webserver
    2024-09-03T17:31:06.820764Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:90: Warming up model
    2024-09-03T17:31:22.384496Z  INFO text_generation_launcher: Cuda Graphs are disabled (CUDA_GRAPHS=None).
    2024-09-03T17:31:22.384696Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:102: Setting max batch total tokens to 292528
    2024-09-03T17:31:22.384724Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:126: Using backend V3
    2024-09-03T17:31:22.384750Z  INFO text_generation_router::server: router/src/server.rs:1651: Using the Hugging Face API
    2024-09-03T17:31:22.384775Z  INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"    
    2024-09-03T17:31:22.832789Z  INFO text_generation_router::server: router/src/server.rs:2349: Serving revision bdd31cf498d13782cc7497cba5896996ce429f91 of model Intel/neural-chat-7b-v3-3
    2024-09-03T17:31:22.862858Z  INFO text_generation_router::server: router/src/server.rs:1747: Overriding LlamaTokenizer with TemplateProcessing to follow python override defined in https://github.com/huggingface/transformers/blob/4aa17d00690b7f82c95bb2949ea57e22c35b4336/src/transformers/models/llama/tokenization_llama_fast.py#L203-L205
    2024-09-03T17:31:22.862918Z  INFO text_generation_router::server: router/src/server.rs:1781: Using config Some(Mistral)
    2024-09-03T17:31:22.862944Z  WARN text_generation_router::server: router/src/server.rs:1928: Invalid hostname, defaulting to 0.0.0.0
    2024-09-03T17:31:22.868816Z  INFO text_generation_router::server: router/src/server.rs:2311: Connected
    
  • Check TGI service:

    # TGI service
    curl http://${host_ip}:9009/generate \
      -X POST \
      -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \
      -H 'Content-Type: application/json'
    

    with the response:

    {"generated_text":"\n\nDeep Learning is a subset of Machine Learning based on Artificial Neural Network"}
    
  • Check vLLM service:

    curl http://${host_ip}:9009/v1/completions \
      -H "Content-Type: application/json" \
      -d '{"model": "Intel/neural-chat-7b-v3-3", "prompt": "What is Deep Learning?", "max_tokens": 32, "temperature": 0}'
    

    with the response:

    {"object":"text_completion","id":"","created":1725387779,"model":"Intel/neural-chat-7b-v3-3","system_fingerprint":"2.2.1-dev0-sha-e4201f4-intel-cpu","choices":[{"index":0,"text":"\n\nDeep Learning is a subset of Machine Learning that is concerned with algorithms inspired by the structure and function of the brain. It is a part of Artificial","logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":6,"completion_tokens":32,"total_tokens":38}}
    

LLM microservice

Test:

curl http://${host_ip}:9000/v1/chat/completions\
  -X POST \
  -d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
  -H 'Content-Type: application/json'

Answer:

data: b'\n'

data: b'\n'

data: b'Deep'

data: b' learning'

data: b' is'

data: b' a'

data: b' subset'

data: b' of'

data: b' machine'

data: b' learning'

data: b' that'

data: b' uses'

data: b' algorithms'

data: b' to'

data: b' learn'

data: b' from'

data: b' data'

data: [DONE]

Megaservice

Test:

curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
     "messages": "What is the revenue of Nike in 2023?"
   }'

Answer:

data: b'\n'

data: b'\n'

data: b'N'

data: b'ike'

data: b"'"

data: b's'

data: b' revenue'

data: b' for'

. . .

data: b' popularity'

data: b' among'

data: b' consumers'

data: b'.'

data: b'</s>'

data: [DONE]

Let's run!

RAG using hyperlink

  • Ask the question:

    [ec2-user@ip-172-31-77-194 ~]$ curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
       "messages": "What is OPEA?"
     }'
    data: b'\n'
    
    data: b'\n'
    
    data: b'The'
    
    data: b' Oklahoma'
    
    data: b' Public'
    
    data: b' Em'
    
    data: b'ploy'
    
    data: b'ees'
    
    data: b' Association'
    
  • Update knowledge base:

    [ec2-user@ip-172-31-77-194 ~]$ curl -X POST "http://${host_ip}:6007/v1/dataprep" \
         -H "Content-Type: multipart/form-data" \
         -F 'link_list=["https://opea.dev"]'
    {"status":200,"message":"Data preparation succeeded"}
    
  • Ask the question:

    curl -X POST "http://${host_ip}:6007/v1/dataprep"      -H "Content-Type: multipart/form-data"      -F 'link_list=["https://opea.dev"]'   http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
         "messages": "What is OPEA?"
     }'
    data: b'\n'
    
    data: b'O'
    
    data: b'PE'
    
    data: b'A'
    
    data: b' stands'
    
    data: b' for'
    
    data: b' Open'
    
    data: b' Platform'
    
    data: b' for'
    
    data: b' Enterprise'
    
    data: b' AI'
    
    data: b'.'
    
  • Delete link from the knowledge base:

    [ec2-user@ip-172-31-77-194 ~]$ # delete link
    curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \
         -d '{"file_path": "https://opea.dev"}' \
         -H "Content-Type: application/json"
    {"detail":"File https://opea.dev not found. Please check file_path."}
    

    This is giving an error: opea-project/GenAIExamples#724

RAG using PDF

This is giving an error: opea-project/GenAIExamples#723

  • Download PDF:
    curl -O https://github.com/opea-project/GenAIComps/blob/main/comps/retrievers/langchain/redis/data/nke-10k-2023.pdf
    
  • Update knowledge base:
    curl -X POST "http://${host_ip}:6007/v1/dataprep" \
       -H "Content-Type: multipart/form-data" \
       -F "files=@./nke-10k-2023.pdf"
    

Debugging Tips

  • Disconnect the network
    sudo docker network disconnect -f ubuntu_default tgi-service
    sudo docker compose down
    
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment