LLM Powered Agents with Kubernetes - Hema Veeradhi & Shrey Anand, Red Hat

Introduction to LLMs and RAG

Large language models (Large language model) are computational models that can generate or model human language, with the ability to predict the next word in a sequence based on input text (00:01:42).
LLMs are good at natural language processing tasks such as text generation, code generation, chatbots, conversational AI, information retrieval, and sentiment analysis, but can be bad at complex or logical tasks and lack dynamicity (00:02:56).
Retrieval Augmented Generation (Retrieval-augmented generation) combines information retrieval techniques with generative AI to enhance the accuracy and relevance of LLMs, allowing them to fetch relevant documents and generate responses based on up-to-date information (00:04:29).
LLM agents extend the capabilities of LLMs by allowing them to interact with external sources or tools, focusing on complex reasoning and planning tasks, and are designed to take actions and make intelligent decisions (00:06:10).

An Large language model agent typically comprises an LLM, access to various tools, and a memory component to keep track of decisions and interactions (00:06:40).
There are different types of LLM agents, including reactive agents, rule-based agents, and single and multi-agents, which can be used depending on the nature of the use case (00:09:30).
A simple approach to routing questions to tools involves predefined rules, whereas a more intelligent approach uses a react agent that reasons and acts without predefined rules, making decisions through an iterative feedback loop (00:10:49).

A use case for an LLM agent is a unified chat assistant for a fictional Cloud company called Cloud Forge Dynamics, which has several departments and documents that can be interacted with through different chatbots (00:11:42).
The Large language model agent uses a react agent to route questions to the right department's documents, and the agent decides which tool to call based on the query, such as the product assistant, HR assistant, or accounts assistant (00:12:47).

The Systems architecture of the system involves a user interface that calls the agent, which sends the query to the LLM, and the LLM decides which tool to call, with the tool services side using a vector database to look for relevant information (00:17:59).
The repository for the LLM agent is open-sourced and includes deployment files and example testing code for the rack tools and agent architecture components (00:19:27).

The agent routing approach enables a multi-agent system where the routing agent can call other agents to collaborate and generate the final response (00:20:38).

Challenges with the agents approach include debugging difficulties, latency, load balancing, and security concerns due to calling multiple APIs and external tools (00:22:10).

Lang chain is the predominantly explored agent framework for backend agent operations, but other frameworks like B agent framework and Open AI's open platform are also being considered for comparison (00:23:33).
The choice of framework depends on the different features each has, such as logging and inbuilt metrics and functionalities (00:24:23).
At the core, these agents are calling tools that happen in the backend, making the core functionality available across frameworks for use in user-facing chatbots or internal applications (00:24:41).