Here's a high-level breakdown of your project in bullet points:
- Objective:
- Provide an infinite memory solution for any LLM by storing and retrieving conversational context dynamically.
-
Infinite Archival Memory:
- Uses PostgreSQL with pgvector for persistent storage of chat interactions.
- Enables context retrieval using vector similarity search, ensuring relevant past interactions are included in responses.
-
Multi-Backend Support:
- Compatible with both OpenAI API and local LLMs (such as Ollama).
- Dynamic switching between backends based on configuration.
-
Automatic Context Retrieval:
- Each query triggers retrieval of the most relevant past interactions based on embedding similarity.
- Provides seamless, contextual conversation without manual intervention.
-
Efficient Embedding Generation:
- Leverages Nomic embeddings via Ollama for fast, local processing.
- Automatically stores embeddings for both queries and responses to enhance future relevance.
-
Auditable Conversation Logging:
- Queries and responses are logged in the database for later review.
- Ensures traceability and transparency for compliance and debugging purposes.
-
RESTful API Endpoints:
- FastAPI backend exposing OpenAI-compatible endpoints for easy integration.
- Simple cURL commands and client-side scripts for interaction.
-
TUI Chat Interface (Testing Tool):
- A command-line chat interface built with a rich prompt library.
- Enables rapid local testing and debugging with VI-style line editing.
-
Backend Framework:
- FastAPI for handling API requests.
- Modular design to allow future enhancements such as multi-user support.
-
Database Layer:
- PostgreSQL with
pgvector
for efficient vector-based search. - Indexed embeddings for fast retrieval of context.
- PostgreSQL with
-
Memory Management:
- Functions for inserting, retrieving, and logging interactions.
- Handles query embeddings, database interactions, and response enrichment.
-
Configuration System:
- Centralized settings management via environment variables.
- Supports flexible tuning of model endpoints and database connections.
-
Installation:
- PostgreSQL setup with pgvector extension enabled.
- Python dependencies managed via Poetry.
-
Running the System:
- API can be started via
poetry run
commands. - Embedding model must be running locally at
http://127.0.0.1:11434
.
- API can be started via
- User sends a query to the API.
- System retrieves relevant past interactions from the database.
- Query and context are passed to the selected LLM backend.
- Response is generated and logged for future context.
- User receives a contextually enriched reply.
This summary should give a clear, concise overview of the project. Let me know if you need any refinements!