Skip to content

Instantly share code, notes, and snippets.

@mathisto
Created January 19, 2025 17:09
Show Gist options
  • Save mathisto/25e955ccf65c71f4d9d3dc374f6eee2e to your computer and use it in GitHub Desktop.
Save mathisto/25e955ccf65c71f4d9d3dc374f6eee2e to your computer and use it in GitHub Desktop.
bullets

Here's a high-level breakdown of your project in bullet points:

Project Overview: Eidolon - Infinite Contextual Memory for LLMs

  • Objective:
    • Provide an infinite memory solution for any LLM by storing and retrieving conversational context dynamically.

Key Features

  • Infinite Archival Memory:

    • Uses PostgreSQL with pgvector for persistent storage of chat interactions.
    • Enables context retrieval using vector similarity search, ensuring relevant past interactions are included in responses.
  • Multi-Backend Support:

    • Compatible with both OpenAI API and local LLMs (such as Ollama).
    • Dynamic switching between backends based on configuration.
  • Automatic Context Retrieval:

    • Each query triggers retrieval of the most relevant past interactions based on embedding similarity.
    • Provides seamless, contextual conversation without manual intervention.
  • Efficient Embedding Generation:

    • Leverages Nomic embeddings via Ollama for fast, local processing.
    • Automatically stores embeddings for both queries and responses to enhance future relevance.
  • Auditable Conversation Logging:

    • Queries and responses are logged in the database for later review.
    • Ensures traceability and transparency for compliance and debugging purposes.
  • RESTful API Endpoints:

    • FastAPI backend exposing OpenAI-compatible endpoints for easy integration.
    • Simple cURL commands and client-side scripts for interaction.
  • TUI Chat Interface (Testing Tool):

    • A command-line chat interface built with a rich prompt library.
    • Enables rapid local testing and debugging with VI-style line editing.

Technical Components

  • Backend Framework:

    • FastAPI for handling API requests.
    • Modular design to allow future enhancements such as multi-user support.
  • Database Layer:

    • PostgreSQL with pgvector for efficient vector-based search.
    • Indexed embeddings for fast retrieval of context.
  • Memory Management:

    • Functions for inserting, retrieving, and logging interactions.
    • Handles query embeddings, database interactions, and response enrichment.
  • Configuration System:

    • Centralized settings management via environment variables.
    • Supports flexible tuning of model endpoints and database connections.

Setup & Deployment

  • Installation:

    • PostgreSQL setup with pgvector extension enabled.
    • Python dependencies managed via Poetry.
  • Running the System:

    • API can be started via poetry run commands.
    • Embedding model must be running locally at http://127.0.0.1:11434.

Example Use Case

  1. User sends a query to the API.
  2. System retrieves relevant past interactions from the database.
  3. Query and context are passed to the selected LLM backend.
  4. Response is generated and logged for future context.
  5. User receives a contextually enriched reply.

This summary should give a clear, concise overview of the project. Let me know if you need any refinements!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment