Skip to content

Instantly share code, notes, and snippets.

@hugobowne
hugobowne / AI Agent Harness — Survey of Open-Source Evaluation Frameworks
Created November 13, 2025 11:10
Survey of open-source evaluation/agent harness frameworks (lm-eval-harness, OpenAI Evals, LangChain, HELM, lmms-eval, leaderboards)
# AI Agent Harness Survey
Short Description of Research Question
I surveyed major open-source projects and leaderboards for evaluating language models (LLMs), multimodal models (LMMs), and agent systems to answer: "What existing harnesses, frameworks, and leaderboards are available for evaluating LLMs/LMMs and building/running agent harnesses?"
## Summary of Findings
- lm-evaluation-harness (EleutherAI)
- A mature, widely-used framework for few-shot evaluation of language models. Supports many model backends (HF transformers, vLLM, GGUF/llama.cpp, OpenAI/Anthropic/TextSynth APIs, vLLM, SGLang, NeMo, etc.), many tasks (>60 academic benchmarks), flexible prompt templating (Jinja2, Promptsource), caching, logging, and hub integration. Backend for Hugging Face's Open LLM Leaderboard.
@hugobowne
hugobowne / What Comes After LLMs — Research Summary
Created November 13, 2025 11:06
Short research synthesis on "what comes after LLMs" — summary of themes across 5 sources (OpenAI, Google results, TechTarget, O'Reilly, Forbes).
# What Comes After LLMs — Research Notes
Short Description of Research Question
What are the leading ideas, research directions, and likely near-term and longer-term evolutions in AI that could follow, complement, or supplant large language models (LLMs)?
## Summary of Findings
Across the sources reviewed, there is broad agreement that LLMs will remain important but are likely to be complemented (not immediately replaced) by a range of approaches that address LLM weaknesses (hallucination, lack of continuous learning, heavy compute, limited reasoning, limited embodiment). Key themes:
@hugobowne
hugobowne / Agentic Context Engineering for AI Agents — Concepts, Benchmarks, and Practices
Created November 13, 2025 11:03
What agentic context engineering is, why it matters for AI agents, recent benchmark evidence, and practical design patterns/frameworks to implement it effectively.
# Agentic Context Engineering for AI Agents — Concepts, Benchmarks, and Practices
Short research on how AI agents select, manage, and structure information in their limited context windows (“context engineering”), with a focus on evidence from recent benchmarks and framework docs.
## Summary of Findings
- What “context engineering” means: deciding what information an agent puts into its prompt at any moment. Agentic context engineering shifts that decision to the agent itself via retrieval, search, and memory operations instead of humans hand-curating prompts. [Letta – Context-Bench]
- Why it matters: models do not use long context uniformly—performance degrades as inputs get longer, and structure, distractors, and semantic similarity all influence outcomes (“context rot”). This makes targeted, minimal, well-structured context critical. [Chroma Research]
- Finite context and reliability limits: classical agent components (planning, memory, tool use) are constrained by context length, and natural-language I
@hugobowne
hugobowne / Context Engineering for AI Agents — Key Patterns and Best Practices (2025)
Created November 13, 2025 10:57
Concise synthesis of current best practices for context engineering in AI agents, drawn from Latent Space (Jeff Huber/Chroma) and Lance Martin’s write-up.
# Context Engineering for AI Agents — Key Patterns and Best Practices (2025)
Short Description of Research Question
What is “context engineering” for AI agents and what practical strategies, pitfalls, and best practices are recommended by current leading sources?
## Summary of Findings
Context engineering is the art and science of filling an LLM’s context window with the right information at each step of an agent’s trajectory. Two recent, influential sources converge on a practical toolkit and operating principles:
@hugobowne
hugobowne / Building MCP Servers — Quickstart and Example Code
Created November 13, 2025 10:52
Notes and references for building an MCP server using the official tutorial and example code.
# Building MCP Servers
How to build a Model Context Protocol (MCP) server, with steps and example code references.
## Summary of Findings
- Core capabilities an MCP server can expose:
- Resources: file-like data clients can read (e.g., API responses, file contents)
- Tools: functions callable by the LLM (with user approval)
- Prompts: prewritten templates to help users perform tasks
@hugobowne
hugobowne / FastMCP Overview
Created November 13, 2025 10:45
Overview of FastMCP framework for building MCP servers and clients.
# FastMCP Overview
Exploring FastMCP, a Pythonic framework designed for building Model Context Protocol (MCP) servers and clients that offer a high-level interface and numerous features for efficient development.
## Summary of Findings
FastMCP is a Python framework that simplifies the process of developing Model Context Protocol (MCP) servers and clients. It provides a high-level, Pythonic interface that focuses on reducing the complexity of implementing the MCP, which connects Language Model Models (LLMs) to various tools and datasets.
Key features of FastMCP include:
- High-level, Pythonic interface for ease of use and rapid development.
@hugobowne
hugobowne / AI Context Engineering - Summary of Findings
Created November 13, 2025 10:43
Research on AI context engineering exploring voice interface opportunities and user context.
# AI Context Engineering
Exploration of context engineering in AI systems.
## Summary of Findings
- **BuiltIn Attempt**: An attempt to access an article on "AI Context Engineering" from BuiltIn ended in a 404 error, thus no information was obtained from that source.
- **Gartner Insights**: The Gartner page titled "Seize New Voice Interface Opportunities Amid the Pandemic" touches tangentially on context engineering by discussing voice interface opportunities. This involves adapting technology to user contexts, such as applying voice marketing for hands-free convenience and health safety. However, the direct topic of context engineering in AI specific to broader applications was not covered in detail, as the focus was more on marketing during the pandemic.
## Sources
- [404 Page - BuiltIn](https://builtin.com/artificial-intelligence/context-engineering-ai) - Attempted access resulted in a 404 error page, no content found.
@hugobowne
hugobowne / Introduction to Machine Learning
Created November 13, 2025 10:30
Summary of key concepts and applications in machine learning.
# Introduction to Machine Learning
Machine learning is a subset of artificial intelligence (AI) that focuses on the use of algorithms to learn from and make predictions based on data. This field has emerged as a backbone for most modern AI systems and is pivotal in applications ranging from forecasting to autonomous vehicles.
## Summary of Findings
Machine learning utilizes algorithms to detect patterns in data, allowing systems to make inferences about new data without explicit programming. It has transformed the way AI operates, underpinning various technologies such as large language models (LLMs), computer vision, and generative AI tools. Machine learning can be classified into three main types:
- **Supervised Learning**: Involves training a model on labeled data to make predictions based on new inputs.
- **Unsupervised Learning**: Identifies patterns in unlabeled data without predefined labels or outcomes.
- **Reinforcement Learning**: Trains models through trial and error to maximize a reward based o
@hugobowne
hugobowne / Re-ranking Techniques
Created November 13, 2025 10:25
Summary of re-ranking techniques used in information retrieval and recommendation systems.
# Re-ranking Techniques
Re-ranking techniques are methods used in information retrieval and recommendation systems to improve the ranking of items in response to a certain query or user input. These techniques can refine initial rankings produced by an algorithm to enhance the accuracy and relevance of results.
## Summary of Findings
Re-ranking involves adjusting the order of results to better align with user preferences, intent, or additional contextual information. Techniques can include:
- **Learning to Rank (LTR)**: A machine learning approach that learns from user interactions and past queries to improve the ranking process.
- *Pointwise* LTR focuses on individual item ranking.
@hugobowne
hugobowne / Context Rot in Agent-based Systems
Created November 13, 2025 10:21
A summary of research findings on context rot in agent-based systems, its causes, implications, and strategies for mitigation.
# Context Rot in Agent-based Systems
This report summarizes findings regarding the phenomenon of context rot in agent-based systems.
## Summary of Findings
Context rot refers to the degradation of contextual information that can impact the functionality and performance of agent-based systems. This degradation occurs over time, as the contextual data becomes outdated or invalid, leading to inefficiencies and errors in decision-making processes. Key factors contributing to context rot include the dynamic nature of environments where agents operate, the complexity of interactions between agents and their contexts, and insufficient mechanisms for updating and managing contextual information.
To mitigate context rot, it is essential to implement robust strategies for continuous learning and adaptation of agents, ensuring they can dynamically update their contextual understanding in real-time. Strategies may involve the incorporation of machine learning techniques, where agents can recognize and learn from chan