More OPEA Examples using Docker Compose

Spin up Ubuntu 24.04 VM and install Docker following the instructions at https://gist.github.com/arun-gupta/7e9f080feff664fbab878b26d13d83d7

CodeGen

Pull the Docker image:
```
sudo docker pull opea/codegen:latest
```

Replace HuggingFace API token and private IP address of the host below and copy the contents in a file named .env:

export host_ip="172.31.50.223"
export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
export LLM_MODEL_ID="deepseek-ai/deepseek-coder-6.7b-instruct"
export TGI_LLM_ENDPOINT="http://${host_ip}:8028"
export MEGA_SERVICE_HOST_IP=${host_ip}
export LLM_SERVICE_HOST_IP=${host_ip}
export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:7778/v1/codegen"

Download Docker Compose file:

curl -O https://raw.githubusercontent.com/opea-project/GenAIExamples/main/CodeGen/docker_compose/intel/cpu/xeon/compose.yaml

Start the application:

sudo docker compose -f compose.yaml up -d

Verify the list of containers:

ubuntu@ip-172-31-50-223:~$ sudo docker container ls
CONTAINER ID   IMAGE                    COMMAND                  CREATED         STATUS         PORTS                                       NAMES
ba99bf66e45b   opea/codegen-ui:latest   "docker-entrypoint.s…"   7 minutes ago   Up 7 minutes   0.0.0.0:5173->5173/tcp, :::5173->5173/tcp   codegen-xeon-ui-server
31a19966946b   opea/codegen:latest      "python codegen.py"      7 minutes ago   Up 7 minutes   0.0.0.0:7778->7778/tcp, :::7778->7778/tcp   codegen-xeon-backend-server
1c1649d31187   opea/llm-tgi:latest      "bash entrypoint.sh"     7 minutes ago   Up 7 minutes   0.0.0.0:9000->9000/tcp, :::9000->9000/tcp   llm-tgi-server

Access the service using cURL command:

curl http://${host_ip}:7778/v1/codegen \
  -H "Content-Type: application/json" \
  -d '{"messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception."}'

This is currently causing opea-project/GenAIExamples#824

CodeTrans

Pull the Docker image:
```
sudo docker pull opea/codetrans:latest
```

Replace HuggingFace API token and private IP address of the host below and copy the contents in a file named .env:

export host_ip="External_Public_IP"
export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
# Example: NGINX_PORT=80
export NGINX_PORT=${your_nginx_port}
export LLM_MODEL_ID="HuggingFaceH4/mistral-7b-grok"
export TGI_LLM_ENDPOINT="http://${host_ip}:8008"
export MEGA_SERVICE_HOST_IP=${host_ip}
export LLM_SERVICE_HOST_IP=${host_ip}
export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:7777/v1/codetrans"
export FRONTEND_SERVICE_IP=${host_ip}
export FRONTEND_SERVICE_PORT=5173
export BACKEND_SERVICE_NAME=codetrans
export BACKEND_SERVICE_IP=${host_ip}
export BACKEND_SERVICE_PORT=7777

Download Docker Compose file:

curl -O https://raw.githubusercontent.com/opea-project/GenAIExamples/main/CodeTrans/docker_compose/intel/cpu/xeon/compose.yaml

Start the application:
```
sudo docker compose -f compose.yaml up -d
```
This is causing opea-project/GenAIExamples#830

DocSum

Pull the Docker image:
```
sudo docker pull opea/docsum:latest
```

Replace HuggingFace API token and private IP address of the host below and copy the contents in a file named .env:

export host_ip="172.31.54.128"
export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token"
export LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
export TGI_LLM_ENDPOINT="http://${host_ip}:8008"
export MEGA_SERVICE_HOST_IP=${host_ip}
export LLM_SERVICE_HOST_IP=${host_ip}
export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/docsum"

Download Docker Compose file:

curl -O https://raw.githubusercontent.com/opea-project/GenAIExamples/main/DocSum/docker_compose/intel/cpu/xeon/compose.yaml

Start the application:

sudo docker compose -f compose.yaml up -d

Verify the list of containers:

ubuntu@ip-172-31-54-128:~$ sudo docker container ls
CONTAINER ID   IMAGE                                                                 COMMAND                  CREATED              STATUS              PORTS                                       NAMES
68ca3c32ecdd   opea/docsum-ui:latest                                                 "docker-entrypoint.s…"   About a minute ago   Up About a minute   0.0.0.0:5173->5173/tcp, :::5173->5173/tcp   docsum-xeon-ui-server
26b0d896b3c7   opea/docsum:latest                                                    "python docsum.py"       About a minute ago   Up About a minute   0.0.0.0:8888->8888/tcp, :::8888->8888/tcp   docsum-xeon-backend-server
bd0606afb0fd   opea/llm-docsum-tgi:latest                                            "bash entrypoint.sh"     About a minute ago   Up About a minute   0.0.0.0:9000->9000/tcp, :::9000->9000/tcp   llm-docsum-server
06d4446bd9b1   ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu   "text-generation-lau…"   2 minutes ago        Up About a minute   0.0.0.0:8008->80/tcp, [::]:8008->80/tcp     tgi-service

Access the service using cURL command:

curl http://${host_ip}:8888/v1/docsum \
  -H "Content-Type: application/json" \
  -d '{"messages": "Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."}'

This is currently causing opea-project/GenAIExamples#835.

Translation

Pull the Docker image:
```
sudo docker pull opea/translation:latest
```
Instructions should be updated to use pre-built images: opea-project/GenAIExamples#836.

Replace HuggingFace API token and private IP address of the host below and copy the contents in a file named .env:

export host_ip="172.31.49.59" #private IP address
export LLM_MODEL_ID="haoranxu/ALMA-13B"
export TGI_LLM_ENDPOINT="http://${host_ip}:8008"
export HUGGINGFACEHUB_API_TOKEN=${your_hf_api_token}
export MEGA_SERVICE_HOST_IP=${host_ip}
export LLM_SERVICE_HOST_IP=${host_ip}
export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/translation"

Download Docker Compose file:

curl -O https://raw.githubusercontent.com/opea-project/GenAIExamples/main/Translation/docker_compose/intel/cpu/xeon/compose.yaml

Start the application:

sudo docker compose -f compose.yaml up -d

Verify the list of containers:

ubuntu@ip-172-31-54-128:~$ sudo docker container ls

Check the logs:

2024-09-18T17:04:48.079544Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-09-18T17:04:58.095135Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-09-18T17:05:08.110327Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-09-18T17:05:10.114867Z  INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-0
2024-09-18T17:05:10.213727Z  INFO shard-manager: text_generation_launcher: Shard ready in 422.780717591s rank=0
2024-09-18T17:05:10.298830Z  INFO text_generation_launcher: Starting Webserver
2024-09-18T17:05:10.440064Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:90: Warming up model
2024-09-18T17:05:39.868187Z  INFO text_generation_launcher: Cuda Graphs are disabled (CUDA_GRAPHS=None).
2024-09-18T17:05:39.868785Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:102: Setting max batch total tokens to 45136
2024-09-18T17:05:39.869908Z  INFO text_generation_router_v3: backends/v3/src/lib.rs:126: Using backend V3
2024-09-18T17:05:39.876810Z  INFO text_generation_router::server: router/src/server.rs:1651: Using the Hugging Face API
2024-09-18T17:05:39.895544Z  INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"    
2024-09-18T17:05:40.482470Z  INFO text_generation_router::server: router/src/server.rs:2349: Serving revision 822086d00e1e61b0c3f99bea3577a916b4360001 of model haoranxu/ALMA-13B
2024-09-18T17:05:40.483950Z  INFO text_generation_router::server: router/src/server.rs:1781: Using config Some(Llama)
2024-09-18T17:05:40.483965Z  WARN text_generation_router::server: router/src/server.rs:1783: Could not find a fast tokenizer implementation for haoranxu/ALMA-13B
2024-09-18T17:05:40.483967Z  WARN text_generation_router::server: router/src/server.rs:1784: Rust input length validation and truncation is disabled
2024-09-18T17:05:40.483989Z  WARN text_generation_router::server: router/src/server.rs:1928: Invalid hostname, defaulting to 0.0.0.0
2024-09-18T17:05:40.490701Z  INFO text_generation_router::server: router/src/server.rs:2311: Connected

Access the service using cURL command:

ubuntu@ip-172-31-49-59:~$ curl http://${host_ip}:8888/v1/translation -H "Content-Type: application/json" -d '{
   "language_from": "Hindi","language_to": "English","source_language": "आप कैसे हो  "}'
data: b' How'

data: b' are'

data: b' you'

data: b'?'

data: b'</s>'

data: [DONE]

arun-gupta/readme.md

More OPEA Examples using Docker Compose

CodeGen

CodeTrans

DocSum

Translation