The Hyper Agent Protocol (HAP) is a new, simplified standard for how software "agents" (or any automated services and capabilities) communicate with each other over the web. Imagine a universal way for different software components to discover what other components can do and then ask them to do it, all using standard web technologies.
The goal of HAP is to make it much easier to build flexible, interoperable systems where different services, tools, or AI agents can work together seamlessly, regardless of how they are built internally.
At its heart, HAP treats every distinct capability or function as something that has its own unique web address (an HTTP URI).
- To find out what a specific agent endpoint can do, you simply make a standard web request (
GET
) to its URI. - The agent endpoint responds with a "Capability Description Document" (CDD). This CDD is a JSON file that clearly lists:
- The primary capability the endpoint offers by default.
- Optionally, any additional related actions it can perform (distinguished by simple names like
#resize
or#summarize
used as fragment identifiers, making them part of the main URI). - The expected inputs and outputs for each capability, defined using standard JSON Schemas. This ensures clarity on what data to send and what to expect back.
- Authentication methods required to use the capabilities.
- Crucially, this CDD is structured using Linked Data principles, similar to JSON-LD and how Schema.org is used. This means the JSON is not just data, but self-describing data:
- It uses an
@context
(a standard JSON-LD field) to define short names for terms and to link properties to well-defined vocabularies (like Schema.org or custom ones). This helps avoid ambiguity. - Objects and descriptions within the CDD can have an
@type
(a JSON-LD keyword) which specifies their type using a URI (e.g., a URI defining what a "CapabilityDescription" is, much like Schema.org defines types likeschema:Person
orschema:Product
). - Specific pieces of information or descriptions can be given a globally unique
@id
(a JSON-LD keyword) using a URI, making them linkable and referenceable. - This approach makes the CDD highly structured, machine-readable, and allows agents to better understand the meaning and structure of the capabilities offered by linking them to shared definitions.
- It uses an
- Once a client knows what an agent endpoint can do (from its CDD), it asks it to perform an action using a standard web
POST
request to the same agent endpoint URI. - The
POST
request includes:- A unique request ID (as an HTTP URI) so both client and server can track this specific request and avoid accidental duplicates.
- Optionally, a capability identifier (usually a short fragment like
"resize"
, or empty if invoking the default primary capability). - The arguments (data) needed for the capability, structured according to the JSON Schema described in the CDD.
A key concept in HAP is the flexible definition of an "agent endpoint":
- An agent endpoint is simply an HTTP URI that offers one or more capabilities according to the HAP protocol.
- An "agent" can be just a single, focused skill: If an endpoint offers only one primary capability (its default action), then that endpoint URI essentially is that skill. Invoking the endpoint invokes that skill.
- An "agent" can be a collection of related skills: An endpoint can offer a primary capability and a few additional, closely related actions. These additional actions are invoked by specifying their short name (fragment) in the
POST
request. - This blurs the lines: a highly specialized "tool" (one skill) and a more complex "agent" (multiple skills) are interacted with using the exact same HAP protocol. The difference lies in the richness of their Capability Description Document.
HAP defines clear ways for actions to occur, with JSON Schemas defining the data formats for requests and responses:
- The client sends a
POST
request. - The agent endpoint performs the action immediately and sends back a standard
200 OK
response containing theresult
(e.g., the summarized text, the processed data). - The response also includes a unique response ID (as an HTTP URI) and echoes the client's request ID.
- For actions that take time, the client sends a
POST
request. - The agent endpoint immediately responds with a
202 Accepted
message. This message includes:- A unique operation ID (an HTTP URI). This ID represents the ongoing task and its URI can be used for status checks or cancellation requests.
- The client's request ID and a unique response ID for this acknowledgment.
- When the agent endpoint finishes the task, it sends a new
POST
request (a "webhook") to a callback URI provided by the client. This webhook contains the finalresult
or an error. - This webhook also includes its own unique delivery ID and the original request ID and operation ID for correlation.
- Clients can also (if the agent endpoint supports it) use the
operation_ID
URI toGET
the status of the ongoing task orPOST
to request its cancellation.
HAP focuses on standardizing the direct communication and self-description of individual agent endpoints. The following areas are considered complementary but separate concerns, to be addressed by other standards or system-level design:
- Global Agent Discovery: Mechanisms for agents to find the URIs of other agents they aren't already aware of (e.g., via registries or specialized search protocols). HAP defines how an agent describes itself once found.
- Complex Trust Frameworks: Establishing trust beyond the declared authentication methods (e.g., verifiable credentials, global reputation systems, "on behalf of" delegation). HAP defines how an agent states its auth needs.
- Multi-Agent Choreography/Orchestration Languages: Standardized languages for defining complex workflows involving multiple agents and their dependencies. HAP provides the bilateral communication links that an orchestrator would use.
- Global Semantic Vocabularies/Ontologies: While HAP's CDD uses JSON-LD principles to link to vocabularies, the creation and governance of these shared vocabularies (e.g., a universal standard for "summarization skill inputs") are external efforts. HAP enables their use.
- Specific Implementations of Authentication Providers: HAP defines how an agent declares its required authentication types (e.g., "Bearer token"), not how those tokens are issued or managed by identity providers.
- Detailed Legal, Ethical Frameworks, and Policy Enforcement: HAP allows linking to such documents, but their content and enforcement are outside the protocol.
HAP aims to provide a simpler, unified foundation that can enable many of the core functionalities seen in both A2A and MCP, while making different trade-offs.
- Covers: A2A's
AgentCard
and MCP'sinitialize
response +tools/list
,resources/list
,prompts/list
. - HAP's approach: A single
GET
to the agent endpoint URI returns a comprehensive CDD detailing all capabilities (primary and additional actions), I/O schemas (using JSON Schema), authentication, and how async operations are handled. The use of JSON-LD principles (@context
,@type
,@id
) in the CDD provides rich, linked self-description.
- Covers: A2A's
tasks/send
and MCP'stools/call
. - HAP's approach: A single
POST
method to the agent endpoint URI. The specific action is determined by the optionalcapability_identifier
in the request body (defaulting to the endpoint's primary capability). Arguments are provided in a structured way defined by the CDD's JSON Schemas.
- Covers: MCP's
resources/read
. - HAP's approach: Modeled as a capability. An agent endpoint can offer a "get_resource" capability (e.g.,
capability_identifier: "#getResource"
). The CDD for this capability would specify how to identify the resource. If the resource has its own public URI, the "get_resource" capability might simply return that URI, allowing the client to fetch it directly.
- Covers: MCP's
prompts/get
. - HAP's approach: Modeled as a capability. An agent endpoint can offer a "get_prompt_template" capability. The arguments would specify the template and any variables, and the result would be the processed prompt.
- Covers: A2A's
tasks/sendSubscribe
(for basic eventing/completion),TaskStatusUpdateEvent
,TaskArtifactUpdateEvent
, and push notifications. Also covers the need for MCP'snotifications/progress
in a simplified way. - HAP's approach:
- Webhooks:
202 Accepted
response to aPOST
includes anoperation_id
(HTTP URI). Completion/failure is signaled by the agent endpoint making aPOST
to a client-provided webhook URI. - Polling for Status: Clients can
GET
theoperation_id
URI to poll for status if the capability supports it (declared in CDD). - SSE on Response: The CDD can declare if a capability's direct
POST
response will be a Server-Sent Event stream for incremental updates.
- Webhooks:
- Covers: A2A's
Message
withPart
s (TextPart, FilePart, DataPart) and MCP'sTextContent
,ImageContent
,AudioContent
. - HAP's approach: The JSON Schemas within the CDD for
inputSchema
andoutputSchema
of any capability define how multi-modal data is structured (e.g., using base64 encoding for binary data with a MIME type field, or URIs to external media).
- A2A: Has a rich, first-class
Task
object with a detailed lifecycle (submitted, working, input-required, completed, failed, canceled) and specific methods to manage it (e.g.,tasks/get
,tasks/cancel
). - HAP: Manages asynchronous operations via an
operation_id
URI. While status can be polled (GET <operation_id>
) and cancellation requested (POST <operation_id>
), it doesn't have the same explicit, detailed intermediate task states (like "input-required") built into the protocol itself unless a capability's specific workflow models it.
- MCP: Allows servers to make requests back to clients (e.g.,
sampling/createMessage
,roots/list
). - HAP: Is primarily client-initiated. For a HAP endpoint (server) to make an arbitrary request to another HAP endpoint (client), the "client" must also expose its own HAP endpoint. Webhooks in HAP are for results/notifications of client-initiated operations.
- MCP:
notifications/.../list_changed
allows servers to push updates about their capabilities. - HAP: Relies on clients re-fetching/re-validating the CDD using
GET
with HTTP caching mechanisms (ETag
).
- A2A:
tasks/resubscribe
. MCP:resources/subscribe
,resources/unsubscribe
. - HAP: SSE streams are tied to a single
POST
response. There's no separate subscribe/unsubscribe for ongoing data feeds or rejoining streams beyond re-initiating thePOST
or relying on webhooks for discrete updates.
- MCP:
initialize
for capability exchange. - HAP: Relies on the initial
GET
of the CDD. No separate session setup beyond standard HTTP.
HAP is a modern, web-friendly protocol designed to make software components (agents, tools, services) more discoverable, understandable, and usable by each other.
- It's simple: Uses basic web requests (
GET
to understand,POST
to act). - It's self-describing: Agents publish rich, machine-readable descriptions of their capabilities (CDDs). These descriptions use well-understood Linked Data principles (like JSON-LD and Schema.org) to define types, properties, and unique identifiers using URIs, making them more meaningful and interoperable.
- It's flexible: An "agent" can be anything from a single, focused function to a more complex service with multiple related actions, all using the same interaction pattern.
- It's robust: Built-in URI-based identifiers help manage asynchronous tasks and ensure reliable communication.
- It's interoperable: By standardizing the "how" of communication and the "style" of description (using web-standard approaches to structured, linked data), HAP allows different systems to connect and collaborate more easily.
Compared to existing protocols like A2A and MCP, HAP offers a more unified and streamlined approach by modeling most interactions as discoverable "capabilities" at a URI. It achieves many of the same goals (tool use, resource access, asynchronous tasks, multi-modal data) but with a simpler set of core protocol rules, pushing more descriptive power into the Capability Description Document. This design prioritizes web-native simplicity and a consistent interaction model.
This approach aims to reduce the complexity of building distributed systems and foster a more dynamic and interconnected ecosystem of automated capabilities, leveraging familiar web standards for data description.
Hyper Agent Protocol (HAP) Schemas
Example: Capability Description Document (CDD)
This is a snippet of a CDD for an Agent Endpoint located at
http://my.agent.com/calculator
.The root object of the CDD describes the Agent Endpoint itself. The
input
andoutput
fields for each action now reference named types (e.g.,calc:SumActionInput
). These named types would be defined elsewhere (e.g., in a HAP vocabulary or a dedicatedtypes
section of the CDD usinghap:PropertyDescriptor
based structures). These referenced types describe the structure of the object that will be the value of thebody
field inhap:AgentRequest
andhap:AgentResponse
messages, respectively. Thebody
object itself will be typed with these referenced types.Example: Invocation of the
#sum
Action (Synchronous)Client POST
http://my.agent.com/calculator
Request Body:
Example Synchronous Server Response (200 OK):
here the response
@id
is locally scoped, more details may or may not be available, depending if the Agent defined@actions
which allow cancel, status etcResponse Body:
Example: Invocation of the
#sum
Action (Asynchronous with Webhook via Request@id
)Client POST
http://my.agent.com/calculator
Request Body:
Server Response 1: Asynchronous Acceptance (202 Accepted)
The
@id
is a globally scoped uri, so is an Agent - likely with one default@action
of#
or'#status
Server Response 2: Webhook Call to Client (Server POST to
http://calling.agent.com/webhook
)Webhook Request Body (from HAP Agent Server to Client's Webhook):
Note on Type Definitions (e.g.,
calc:SumActionInput
,hap:PropertyDescriptor
):The CDD's
input
andoutput
fields for anhap:Action
reference named types (e.g.,calc:SumActionInput
). These named types (likecalc:SumActionInput
) would be defined elsewhere (e.g., in a dedicated"types"
section within this CDD, or in an external HAP vocabulary document referenced via@context
). Such a type definition would specify its own@type
(e.g.,hap:ActionInputDescriptor
) and list its properties usinghap:PropertyDescriptor
.The type referenced by an Action's
input
field (e.g.,calc:SumActionInput
) describes the structure of the object that will be the value of thebody
field in anhap:AgentRequest
. This innerbody
object MUST also be typed with this referenced type (e.g.@type: "calc:SumActionInput"
). It is understood that the type referenced by an Action'sinput
field (e.g.,calc:SumActionInput
) is a semantic subtype ofhap:ActionInput
.The type referenced by an Action's
output
field (e.g.,calc:SumActionOutput
) describes the structure of the object that will be the value of thebody
field in anhap:AgentResponse
. This innerbody
object MUST also be typed with this referenced type (e.g.@type: "calc:SumActionOutput"
). It is understood that the type referenced by an Action'soutput
field (e.g.,calc:SumActionOutput
) is a semantic subtype ofhap:ActionOutput
.Furthermore, the
body
property of ahap:AgentRequest
itself is understood to expect instances ofhap:ActionInput
or its subtypes as its value. Similarly, thebody
property of ahap:AgentResponse
expects instances ofhap:ActionOutput
or its subtypes as its value.