hugobowne · November 13, 2025 11:49
diff --git a/AI Agents — Overview and Context b/AI Agents — Overview and Context
 # AI Agents — Overview and Context

 Short Description of Research Question

 What are AI agents (agentic AI), how are they defined and classified, what practical frameworks and tools exist for building them, what real-world examples and industry perspectives exist, and what are the main benefits, risks, and benchmark/evaluation issues?

 ## Summary of Findings

 - Definitions & conceptual foundations
  - The term "intelligent agent" refers to any entity that perceives its environment, takes actions autonomously to achieve goals, and may improve via learning or knowledge acquisition (Wikipedia). "Agentic AI" describes modern systems that proactively pursue goals, plan, integrate tools, and act over extended periods, usually powered by LLMs and orchestration software.
  - Agent classes include simple reflex, model-based, goal-based, utility-based, and learning agents; agentic AI is a class emphasizing autonomy, long-running planning, memory, and tool integration (Wikipedia, HBR, Wired).

 - Practical frameworks & developer tooling
  - LangChain (and its LangGraph runtime) provides a widely-used agent abstraction and developer tooling, enabling creation of agents in a few lines of code and supporting durable execution, streaming, human-in-the-loop, persistence, and debugging via LangSmith. Example code shows creating an agent with tools and invoking it programmatically (LangChain docs).
  - Industry frameworks and multi-agent toolkits (e.g., CAMEL, Microsoft AutoGen, and similar projects) are cited across sources as part of the agent ecosystem (Wikipedia referencing these frameworks; LangChain docs point to LangGraph and LangSmith).

 - Representative examples & industry perspective
  - Startups and large labs demonstrate agents that can act (Devin AI as an "AI developer" demo, Auto-GPT experiments, SIMA by DeepMind for game-playing). Journalistic coverage (WIRED) highlights demos and the hype cycle, noting staged demos can be impressive yet brittle.
  - HBR frames agentic AI as a shift in how humans interact with AI (examples: travel planning agents, virtual caregivers, supply-chain optimization) and discusses implications for work.

 - Benchmarking, evaluation, and research concerns
  - The arXiv paper "AI Agents That Matter" (Kapoor et al.) critiques current benchmarking practices: overemphasis on accuracy, insufficient attention to cost, conflation of model & downstream needs, inadequate holdout sets, overfitting to benchmarks, and lack of standardized evaluation and reproducibility. The paper proposes jointly optimizing accuracy and cost and prescribes evaluation practices to improve real-world usefulness.

 - Benefits claimed
  - Potential increases in personal and organizational productivity, automation of repetitive tasks, and enabling people (e.g., accessibility use cases) to offload complex workflows (HBR, Wikipedia summaries).

 - Risks and concerns
  - Safety and security: agents that act (not just advise) increase risk of costly errors, security vulnerabilities, and misuse (WIRED, Wikipedia).
  - Privacy and data access concerns when agents integrate with external systems or hold user credentials.
  - Reward hacking, hallucinations, algorithmic bias, and compounding errors across multi-step actions.
  - Operational costs and environmental impact due to compute and orchestration demand.
  - Lack of standardization, reproducibility, and robust benchmarks hinders reliable deployment (arXiv paper, Wikipedia).

 ## Sources

 - [OpenAI — Chat guide (blocked)](https://platform.openai.com/docs/guides/chat) - attempted access; Cloudflare/anti-bot challenge prevented content retrieval (page showed "Just a moment...").
 - [OpenAI — Agents guide (blocked)](https://platform.openai.com/docs/guides/agents) - attempted access; Cloudflare/anti-bot challenge prevented content retrieval.
 - [Intelligent agent — Wikipedia](https://en.wikipedia.org/wiki/Intelligent_agent) - detailed definitions, agent classes (reflex, model-based, goal-based, utility-based, learning), concept of agentic AI, frameworks and examples, benefits and concerns.
 - [What Is Agentic AI, and How Will It Change Work? — Harvard Business Review](https://hbr.org/2024/12/what-is-agentic-ai-and-how-will-it-change-work) - essay framing agentic AI as a shift for workplaces with practical examples (travel planning, virtual caregiving, supply-chain specialists) and discussion of implications for work.
 - [LangChain overview / Agents docs (LangChain)](https://docs.langchain.com/oss/python/langchain/overview) - documentation showing LangChain's agent abstractions, LangGraph runtime, example code to create & invoke agents, and ecosystem tooling (LangSmith) for debugging/tracing agent behavior.
 - [LangChain blog — introducing agents (404)](https://blog.langchain.com/introducing-langchain-agents/) - visited but returned a 404 / page not found.
 - [OpenAI research — Agents (blocked)](https://openai.com/research/agents) - attempted access; Cloudflare/anti-bot challenge prevented content retrieval (page showed "Just a moment...").
 - [AI Agents That Matter — arXiv (Kapoor et al., 2024)](https://arxiv.org/abs/2407.01502) - research preprint analyzing agent benchmarks, highlighting shortcomings in current evaluation practices and proposing principled improvements (joint accuracy+cost objectives, better holdouts, reproducibility).
 - [Forget Chatbots. AI Agents Are the Future — WIRED](https://www.wired.com/story/fast-forward-forget-chatbots-ai-agents-are-the-future/) - industry reporting on demos (Devin, Auto-GPT, SIMA), analysis of promise vs. brittleness, and discussion of commercialization/hype.


 ---

 Notes on methodology and browsing

 - I visited 9 pages (mix of academic, industry, and documentation sources). Some vendor pages (OpenAI docs and OpenAI research) were blocked by Cloudflare/anti-bot pages and thus could not be scraped; I included their URLs and noted the access issue.
 - This gist synthesizes definitions, frameworks, examples, evaluation critiques, and risks based on the sources successfully accessed above.
No results found