Skip to content

Instantly share code, notes, and snippets.

@hugobowne
hugobowne / AI Agents — Overview and Context
Created November 13, 2025 11:49
Synthesis of definitions, frameworks, examples, benchmarks, benefits, and risks for AI agents (agentic AI).
# AI Agents — Overview and Context
Short Description of Research Question
What are AI agents (agentic AI), how are they defined and classified, what practical frameworks and tools exist for building them, what real-world examples and industry perspectives exist, and what are the main benefits, risks, and benchmark/evaluation issues?
## Summary of Findings
- Definitions & conceptual foundations
- The term "intelligent agent" refers to any entity that perceives its environment, takes actions autonomously to achieve goals, and may improve via learning or knowledge acquisition (Wikipedia). "Agentic AI" describes modern systems that proactively pursue goals, plan, integrate tools, and act over extended periods, usually powered by LLMs and orchestration software.
@hugobowne
hugobowne / Research Summary on Large Language Models for Low Resource Languages
Created November 13, 2025 11:29
Summary of recent research on LLMs and NLP for low resource languages.
# Research on Large Language Models (LLMs) for Low Resource Languages
## Short Description of Research Question
How do recent studies address the challenges and improvements of LLMs and other language technologies for low-resource languages?
## Summary of Work
Recent research on LLMs and NLP for low-resource languages addresses diverse challenges including data scarcity, linguistic bias, domain specificity, and evaluation dataset creation. Major themes include:
1. Workshop Overview: The LoResLM 2025 workshop showcased 35 papers focusing on linguistic inclusivity in NLP for low-resource languages across multiple language families and research areas.
# Reranking with Large Language Models (LLMs)
## Short Description of Research Question
How to efficiently rerank hypotheses or retrieved passages using Large Language Models to optimize for quality metrics beyond model probability, while managing the computational cost?
## Summary of Work
1. **EEL: Efficiently Encoding Lattices for Reranking (2023)**
- Investigates reranking hypotheses for conditional text generation by encoding lattices of outputs efficiently with Transformers.
@hugobowne
hugobowne / Context engineering ai - browsing failed
Created November 13, 2025 11:21
Automated research attempt for 'context engineering AI' — browsing tools failed; record of attempts and recommended next steps.
# Context Engineering AI - Automated Research Attempt
Short Description of Research Question
- Research topic: "context engineering" in AI — definitions, practices, tools, notable papers, tutorials, and community resources.
## Summary of Findings
I attempted to perform automated web research using the required browser automation tools, but the browsing/navigation tool failed repeatedly and I could not visit external websites to extract information. Below are details of the attempts and errors encountered. Because I could not visit any pages, I do not have research findings; instead I have a record of the failed attempts and next recommended steps.
@hugobowne
hugobowne / Context engineering and context rot
Created November 13, 2025 11:13
Research summary: Context engineering and context rot
# Context engineering and context rot
Short Description of Research Question
What is "context rot" (the failure modes of LLMs as context length grows) and what context-engineering practices and mitigations are recommended by recent research and industry sources?
## Summary of Findings
- Definition: "Context rot" refers to the phenomenon where increasing the amount of tokens in a model's context window (longer inputs / longer histories) leads to degraded, inconsistent, or unreliable model performance — e.g., forgetting facts in the middle of long documents, hallucinations, or refusals.
@hugobowne
hugobowne / AI Agent Harness — Survey of Open-Source Evaluation Frameworks
Created November 13, 2025 11:10
Survey of open-source evaluation/agent harness frameworks (lm-eval-harness, OpenAI Evals, LangChain, HELM, lmms-eval, leaderboards)
# AI Agent Harness Survey
Short Description of Research Question
I surveyed major open-source projects and leaderboards for evaluating language models (LLMs), multimodal models (LMMs), and agent systems to answer: "What existing harnesses, frameworks, and leaderboards are available for evaluating LLMs/LMMs and building/running agent harnesses?"
## Summary of Findings
- lm-evaluation-harness (EleutherAI)
- A mature, widely-used framework for few-shot evaluation of language models. Supports many model backends (HF transformers, vLLM, GGUF/llama.cpp, OpenAI/Anthropic/TextSynth APIs, vLLM, SGLang, NeMo, etc.), many tasks (>60 academic benchmarks), flexible prompt templating (Jinja2, Promptsource), caching, logging, and hub integration. Backend for Hugging Face's Open LLM Leaderboard.
@hugobowne
hugobowne / What Comes After LLMs — Research Summary
Created November 13, 2025 11:06
Short research synthesis on "what comes after LLMs" — summary of themes across 5 sources (OpenAI, Google results, TechTarget, O'Reilly, Forbes).
# What Comes After LLMs — Research Notes
Short Description of Research Question
What are the leading ideas, research directions, and likely near-term and longer-term evolutions in AI that could follow, complement, or supplant large language models (LLMs)?
## Summary of Findings
Across the sources reviewed, there is broad agreement that LLMs will remain important but are likely to be complemented (not immediately replaced) by a range of approaches that address LLM weaknesses (hallucination, lack of continuous learning, heavy compute, limited reasoning, limited embodiment). Key themes:
@hugobowne
hugobowne / Agentic Context Engineering for AI Agents — Concepts, Benchmarks, and Practices
Created November 13, 2025 11:03
What agentic context engineering is, why it matters for AI agents, recent benchmark evidence, and practical design patterns/frameworks to implement it effectively.
# Agentic Context Engineering for AI Agents — Concepts, Benchmarks, and Practices
Short research on how AI agents select, manage, and structure information in their limited context windows (“context engineering”), with a focus on evidence from recent benchmarks and framework docs.
## Summary of Findings
- What “context engineering” means: deciding what information an agent puts into its prompt at any moment. Agentic context engineering shifts that decision to the agent itself via retrieval, search, and memory operations instead of humans hand-curating prompts. [Letta – Context-Bench]
- Why it matters: models do not use long context uniformly—performance degrades as inputs get longer, and structure, distractors, and semantic similarity all influence outcomes (“context rot”). This makes targeted, minimal, well-structured context critical. [Chroma Research]
- Finite context and reliability limits: classical agent components (planning, memory, tool use) are constrained by context length, and natural-language I
@hugobowne
hugobowne / Context Engineering for AI Agents — Key Patterns and Best Practices (2025)
Created November 13, 2025 10:57
Concise synthesis of current best practices for context engineering in AI agents, drawn from Latent Space (Jeff Huber/Chroma) and Lance Martin’s write-up.
# Context Engineering for AI Agents — Key Patterns and Best Practices (2025)
Short Description of Research Question
What is “context engineering” for AI agents and what practical strategies, pitfalls, and best practices are recommended by current leading sources?
## Summary of Findings
Context engineering is the art and science of filling an LLM’s context window with the right information at each step of an agent’s trajectory. Two recent, influential sources converge on a practical toolkit and operating principles:
@hugobowne
hugobowne / Building MCP Servers — Quickstart and Example Code
Created November 13, 2025 10:52
Notes and references for building an MCP server using the official tutorial and example code.
# Building MCP Servers
How to build a Model Context Protocol (MCP) server, with steps and example code references.
## Summary of Findings
- Core capabilities an MCP server can expose:
- Resources: file-like data clients can read (e.g., API responses, file contents)
- Tools: functions callable by the LLM (with user approval)
- Prompts: prewritten templates to help users perform tasks