Skip to content

Instantly share code, notes, and snippets.

@usrbinkat
Created October 1, 2024 06:49
Show Gist options
  • Save usrbinkat/de44facc683f954bf0cca6c87e2f9f88 to your computer and use it in GitHub Desktop.
Save usrbinkat/de44facc683f954bf0cca6c87e2f9f88 to your computer and use it in GitHub Desktop.
Ollama + Open-Webui + Nvidia/CUDA + Docker + docker-compose

image

Here's a sample README.md file written by Llama3.2 using this docker-compose.yaml file that explains the purpose and usage of the Docker Compose configuration:

ollama-portal

A multi-container Docker application for serving OLLAMA API.

Overview

This repository provides a Docker Compose configuration for running two containers: open-webui and ollama. The open-webui container serves a web interface that interacts with the ollama container, which provides an API or service. This setup is designed to work together seamlessly, allowing users to access OLLAMA's functionality through a user-friendly web interface.

Architecture

The application consists of two main components:

  • OLLAMA: A GPU-accelerated neural network inference service that provides a RESTful API for serving models.
  • Open-WebUI: A web-based interface for interacting with the OLLAMA API, providing a simple and intuitive way to deploy and manage models.

Docker Compose Configuration

The Docker Compose configuration file (docker-compose.yaml) defines several key settings:

  • Services: The application consists of two services: open-webui and ollama. Each service is defined with its own set of environment variables, volumes, and ports.
  • Environment Variables:
  • MODEL_DOWNLOAD_DIR: Specifies the directory for storing downloaded models.
  • OLLAMA_API_BASE_URL: Sets the base URL for the OLLAMA API.
  • LOG_LEVEL: Configures the log level for both containers.
  • Volumes: The application mounts several volumes to share data between containers. These include:
  • data: For storing user input and model artifacts.
  • models: For accessing pre-trained models.
  • ollama: For storing application-specific data.

Container Configuration

The Docker Compose configuration defines the following container configurations:

  • OLLAMA Container:
  • Uses the official OLLAMA image (ollama/ollama:latest).
  • Specifies NVIDIA GPU acceleration using the runtime: nvidia option.
  • Configures the container to use all available GPUs in the cluster.
  • Open-WebUI Container:
  • Uses the official Open-WebUI image (ghcr.io/open-webui/open-webui:main).
  • Specifies environment variables for model download directories and OLLAMA API URLs.

Networking

The application uses a single network (ollama-net) that connects both containers. This allows them to communicate with each other seamlessly.

Running in Production

To run this application in production, you'll need to:

  • Set up your OLLAMA API on the ollama container.
  • Configure the open-webui container to connect to your OLLAMA API.
  • Mount necessary volumes and adjust configuration variables as needed.

Troubleshooting

If you encounter issues while running this application, please refer to the Docker Compose troubleshooting guide for assistance.

Security Considerations

This application uses the following security measures:

  • Model signing: The OLLAMA API verifies model signatures using a digital certificate.
  • Input validation: The Open-WebUI container validates user input to prevent injection attacks.
  • Encryption: Data exchanged between containers is encrypted using SSL/TLS.

Performance Optimization

To optimize performance, consider the following:

  • Model caching: Use a caching layer (e.g., Redis) to store frequently accessed models.
  • Container orchestration: Use a container orchestration tool (e.g., Kubernetes) to manage and scale your containers.
  • GPU acceleration: Configure multiple GPUs on your system for optimal performance.

This enhanced README file provides more in-depth technical explanations, including architecture, Docker Compose configuration, container configurations, networking, security considerations, and performance optimization. If you have any further questions or concerns, feel free to open a discussion on our GitHub page!

image

services:
open-webui:
container_name: open-webui
image: ghcr.io/open-webui/open-webui:main
environment:
- MODEL_DOWNLOAD_DIR=/models
- OLLAMA_API_BASE_URL=http://ollama:11434
- OLLAMA_API_URL=http://ollama:11434
- LOG_LEVEL=debug
volumes:
- data:/data
- models:/models
- open-webui:/config
ports:
- "8080:8080"
logging:
driver: json-file
options:
max-size: "5m"
max-file: "2"
depends_on:
- ollama
extra_hosts:
- "host.docker.internal:host-gateway"
networks:
- ollama-net
restart: unless-stopped
ollama:
container_name: ollama
image: ollama/ollama:latest
runtime: nvidia
environment:
- NVIDIA_VISIBLE_DEVICES=all
- NVIDIA_DRIVER_CAPABILITIES=compute,utility
- CUDA_VISIBLE_DEVICES=0
- LOG_LEVEL=debug
deploy:
resources:
reservations:
devices:
- driver: nvidia
capabilities: [gpu]
count: all
volumes:
- ollama:/root/.ollama
- models:/models
ports:
- "11434:11434"
logging:
driver: json-file
options:
max-size: "5m"
max-file: "2"
networks:
- ollama-net
restart: unless-stopped
volumes:
data:
models:
ollama:
open-webui:
networks:
ollama-net:
driver: bridge
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment