Agents have arrived and they need their own picks and shovels

We’re seeing a strong wave of AI agent applications increasingly capable of taking on and automating complex workflows. A lot of what is driving this is taking place in the model layer - with frontier models better able to plan and reason. As Sam Altman has outlined, the emergence of better reasoning capabilities in models such as OpenAI o1 provides the foundation for potentially quick advancement to highly capable AI agents that can understand context, plan and reason in order to make decisions, and then take actions in order to achieve goals.

However, improved models alone will not deliver us advanced and autonomous agents, and, as of today, most models remain poor planners and low-level reasoners. This means that most AI agents are still susceptible to errors that become compounded across multi-step processes, meaning they are unable to reliably take on complex end-to-end tasks. As a result, an ecosystem of enabling software and new architectural approaches is emerging that supports agentic workflows through tool use, multi-agent frameworks, chain-of-thought-reasoning, self-reflection, planning, and other methods.

We expect to see this AI agent infrastructure evolve rapidly in the coming years as a result of (i) the abundance of new agent applications being built, and (ii) the changing challenges and opportunities that tooling can address as models improve and agents can become increasingly unconstrained.

We are seeing the emergence of infrastructure and tooling including:

Enterprise platforms for building with agents, such as Emergence
Frameworks and developer kits for building agents, such as LangChain, Oscar, AgentKit (from BCG X), and Semantic Kernel
Platforms for hosting agents, such as LangServe and numerous platforms for enterprise LLM applications
Agent evaluations, such as AgentOps, Braintrust, Langfuse, and Context
Orchestration between different models and agents
Tools supporting the ability to personalise agent memory towards a given user and their current context, such as Letta
Tools supporting the ability of agents to take actions, such as NPi, Mindware, and Imprompt
Tools supporting agents in browsing the web and extracting web data, such as Reworkd, Tiny Fish, Browse AI, Browerbase, Apify, and Browslerless
Authentication for agents to take actions on a user’s behalf, such as Anon and Clerk
Multi-agent frameworks supporting effective hierarchies and collaboration between several specialised agents, such as CrewAI, AutoGen, and AgentScope
Setting of guardrails and constraints to ensure an agent stays on track and avoids harmful or misleading output

huksley/ai-agents.md

Agents have arrived and they need their own picks and shovels