mkbctrl / managing_context_quality_in_long_conversations.md

Last active May 2, 2025 10:43

Cursor docs with best practices for managing context and quality of long conversations within AI applications

Managing Context and Quality in Long Conversations

A multi-agent conversational system requires awareness of the memory limitations of language models. Large models (GPT-4, Claude, Mistral, Gemini) do offer very wide context windows, but they still operate on a “sliding window” basis. This is illustrated in Fig. 1 – when the context window fills up, new tokens “push” older ones out of the model’s memory window, causing earlier information to be lost. In practice, this means that over time, during a long conversation the model can forget earlier turns, start repeating itself, or make coherence errors. This phenomenon is sometimes called Context Degradation Syndrome. After just a few dozen to a few hundred exchanges, the model can “lose the thread” and generate increasingly imprecise answers ([Context Degradation Syndrome: When Large Language Models Lose the Plot](https://jameshoward.us/2024/11/26/context-degradation-syndrome-when-large-language-models-lose-the-plot#:~:text=As%20conversations%20len

mkbctrl / ai_agent_handoffs_vs_agent_as_tool.md

Last active May 2, 2025 10:43

practical experiences and opinions on the “handoff” vs. “agent-as-tool” approaches in agent systems, including real-world project examples and specific frameworks like LangChain, CrewAI, AutoGPT, and others.

Agentive Mechanisms: Handoff vs. Agent-as-Tool

In multi-agent systems, agents can collaborate in two primary ways: handoff (transferring control) or agent-as-tool (agent as a tool). In the handoff pattern, one agent completes its part of the work and passes the entire context to the next “specialist” agent instead of continuing to process it itself (Handoffs — AutoGen) (Multi-agent systems – Agent Development Kit). Conversely, the agent-as-tool pattern has the main agent invoke a secondary agent like a function or API call—then integrates its response into the ongoing conversation ([Multi-agent systems – Agent Development Kit](https://google.github.io/adk-docs/agents/multi-agents/#:~:text=,invocation%20like%20any%20other%20to

mkbctrl / best_practices_ai_agent_api_design.md

Last active May 2, 2025 10:42

general API design principles that make them easy for conversational agents to use effectively

Best Practices for Agent-Friendly API Design

When designing APIs intended for intelligent agents (conversational LLMs, multi-agent orchestration, etc.), it’s helpful to treat them as “machine interfaces”—clear and unambiguous not only to developers, but to algorithms as well. A good starting point is to produce a full OpenAPI specification for your service (for example using FastAPI, which automatically generates Swagger/OpenAPI docs). The OpenAPI standard lets agents read the entire API definition—what resources and parameters are available, how to authenticate, what inputs to send, and what responses to expect ([Building an AI agent with OpenAPI: LangChain vs. Haystack]). Crafting complete, AI-ready documentation is critical ([Is Your API AI-ready? Our Guidelines and Best Practices]).

Rich descriptions and metadata. Every endpoint and parameter should have an exhaustive description—not just a repeat of its name, but an explanation of “what this endpoint does,” “what data it expects,” and

mkbctrl / ai_apps_background_task_queue.md

Last active May 2, 2025 10:42

effectively implementing background task queues in an AI chat-agent application, focusing on FastAPI-compatible solutions with good UX and proven track records in agent systems.

Background Task Queue in an AI Application

Chat-based AI apps often need to perform long-running operations (e.g. document generation, data analysis) that can’t block the user interaction. To keep the system responsive, these tasks must run in the background, outside the normal request–response cycle. A common pattern is to introduce an asynchronous task queue: the user’s request is immediately enqueued (e.g. returning “task accepted”) ([Background Tasks – FastAPI])([Using FastAPI with SocketIO to Display Real-Time Progress of Celery Tasks | by Fadi Shaar | Medium]), and a separate process (or cluster of workers) executes the job. This way, FastAPI can instantly handle further requests while heavy work proceeds independently, keeping the API responsive for other clients ([Using FastAPI with SocketIO to Display Real-Time Progress of Celery Tasks | by Fadi Shaar | Medium])([Background Tasks – FastAPI]).

For example, a contract-generation chatbot can immediately confirm receipt (“Document generation in progr

mkbctrl / ai_intent_regonition_and_routing.md

Created May 2, 2025 10:40

Intent Recognition and Auto‑Routing in Multi-Agent Systems

Modern conversational AI systems often split functionality into multiple tools or sub-agents, each specialized for a task (e.g. search, booking, math, etc.). When a user sends a query, the system must interpret intent and dispatch it to the right tool/agent. There are two broad approaches: letting a general-purpose LLM handle intent detection itself, or using a dedicated router component. In practice, many practitioners use a hybrid: an initial “router” classifies the intent and then a specialized agent or tool handles the task. Below we survey best practices and examples of each approach, referencing frameworks like LangChain and Semantic Router.

LLM-Based Intent Recognition (General-Agent Approach)

A common approach is to have the LLM itself decide which tool or chain to invoke. For example, one can prompt the model to output a JSON field indicating the desired “tool” or “function” (using OpenAI’s function-calling or ChatGPT Pl

mkbctrl / agents_sdk_input_best_practices.md

Last active May 7, 2025 18:05

Cursor document on Agents SDK input_type and input_filtering best practices

Effective Input Management in OpenAI Agents SDK Handoffs: Mastering input_type and input_filter

I. Introduction to Input Management in Agent Handoffs (OpenAI Agents SDK)

A. The Imperative of Controlled Data Flow in Multi-Agent Systems

The development of sophisticated multi-agent systems introduces significant challenges in managing the flow of data and context between individual agents. As the complexity of these systems grows, with multiple agents collaborating to achieve a common goal, the potential for errors, inefficiencies, and unpredictable behavior due to mismanaged data also increases. Uncontrolled data flow can lead to agents receiving irrelevant or incorrectly formatted information, hindering their ability to perform their designated tasks effectively. The OpenAI Agents SDK is designed to address these challenges by providing a set of primitives, including handoffs, which facilitate the intelligent transfer of control between agents.1 This SDK aims to enable the construction of complex,

mkbctrl / cursor_openai_reasoning_tokens.md

Created May 16, 2025 22:01

Streaming Agent Reasoning with OpenAI Agents SDK and Reasoning Models

The integration of OpenAI's reasoning models (o-series) with the Agents SDK presents intriguing possibilities for developers who want to observe an agent's thinking process in real-time. While there are limitations to accessing the complete "train of thought," there are several methods to stream insights into an agent's reasoning as it works.

Understanding Reasoning Models and Their Tokens

OpenAI's reasoning models (o1, o3, o4 series) utilize a special type of processing called "reasoning tokens" in addition to standard input and output tokens. These reasoning tokens represent the model's internal thinking process as it breaks down problems and considers multiple approaches[^9].

Mikołaj Koropecki mkbctrl