Skip to content

Instantly share code, notes, and snippets.

@webr3
Last active May 11, 2025 13:17
Show Gist options
  • Save webr3/a3485dca649f197b0f7ce557e9c865b2 to your computer and use it in GitHub Desktop.
Save webr3/a3485dca649f197b0f7ce557e9c865b2 to your computer and use it in GitHub Desktop.

Hyper Agent Protocol (HAP) - An Overview

What is HAP?

The Hyper Agent Protocol (HAP) is a new, simplified standard for how software "agents" (or any automated services and capabilities) communicate with each other over the web. Imagine a universal way for different software components to discover what other components can do and then ask them to do it, all using standard web technologies.

The goal of HAP is to make it much easier to build flexible, interoperable systems where different services, tools, or AI agents can work together seamlessly, regardless of how they are built internally.

Core Idea: Every Capability has a Web Address (URI)

At its heart, HAP treats every distinct capability or function as something that has its own unique web address (an HTTP URI).

1. Discovering What an Agent Can Do (The "Agent Descriptor")

  • To find out what a specific agent endpoint can do, you simply make a standard web request (GET) to its URI.
  • The agent endpoint responds with a "Capability Description Document" (CDD). This CDD is a JSON file that clearly lists:
    • The primary capability the endpoint offers by default.
    • Optionally, any additional related actions it can perform (distinguished by simple names like #resize or #summarize used as fragment identifiers, making them part of the main URI).
    • The expected inputs and outputs for each capability, defined using standard JSON Schemas. This ensures clarity on what data to send and what to expect back.
    • Authentication methods required to use the capabilities.
  • Crucially, this CDD is structured using Linked Data principles, similar to JSON-LD and how Schema.org is used. This means the JSON is not just data, but self-describing data:
    • It uses an @context (a standard JSON-LD field) to define short names for terms and to link properties to well-defined vocabularies (like Schema.org or custom ones). This helps avoid ambiguity.
    • Objects and descriptions within the CDD can have an @type (a JSON-LD keyword) which specifies their type using a URI (e.g., a URI defining what a "CapabilityDescription" is, much like Schema.org defines types like schema:Person or schema:Product).
    • Specific pieces of information or descriptions can be given a globally unique @id (a JSON-LD keyword) using a URI, making them linkable and referenceable.
    • This approach makes the CDD highly structured, machine-readable, and allows agents to better understand the meaning and structure of the capabilities offered by linking them to shared definitions.

2. Asking an Agent to Perform an Action

  • Once a client knows what an agent endpoint can do (from its CDD), it asks it to perform an action using a standard web POST request to the same agent endpoint URI.
  • The POST request includes:
    • A unique request ID (as an HTTP URI) so both client and server can track this specific request and avoid accidental duplicates.
    • Optionally, a capability identifier (usually a short fragment like "resize", or empty if invoking the default primary capability).
    • The arguments (data) needed for the capability, structured according to the JSON Schema described in the CDD.

The Nuance: An "Agent" and a "Skill" Can Be (Almost) the Same Thing

A key concept in HAP is the flexible definition of an "agent endpoint":

  • An agent endpoint is simply an HTTP URI that offers one or more capabilities according to the HAP protocol.
  • An "agent" can be just a single, focused skill: If an endpoint offers only one primary capability (its default action), then that endpoint URI essentially is that skill. Invoking the endpoint invokes that skill.
  • An "agent" can be a collection of related skills: An endpoint can offer a primary capability and a few additional, closely related actions. These additional actions are invoked by specifying their short name (fragment) in the POST request.
  • This blurs the lines: a highly specialized "tool" (one skill) and a more complex "agent" (multiple skills) are interacted with using the exact same HAP protocol. The difference lies in the richness of their Capability Description Document.

How Actions Work (Request/Response Types)

HAP defines clear ways for actions to occur, with JSON Schemas defining the data formats for requests and responses:

1. Normal (Synchronous) Actions

  • The client sends a POST request.
  • The agent endpoint performs the action immediately and sends back a standard 200 OK response containing the result (e.g., the summarized text, the processed data).
  • The response also includes a unique response ID (as an HTTP URI) and echoes the client's request ID.

2. Long-Running Actions (Asynchronous via Webhooks)

  • For actions that take time, the client sends a POST request.
  • The agent endpoint immediately responds with a 202 Accepted message. This message includes:
    • A unique operation ID (an HTTP URI). This ID represents the ongoing task and its URI can be used for status checks or cancellation requests.
    • The client's request ID and a unique response ID for this acknowledgment.
  • When the agent endpoint finishes the task, it sends a new POST request (a "webhook") to a callback URI provided by the client. This webhook contains the final result or an error.
  • This webhook also includes its own unique delivery ID and the original request ID and operation ID for correlation.
  • Clients can also (if the agent endpoint supports it) use the operation_ID URI to GET the status of the ongoing task or POST to request its cancellation.

What is Classed as Out of Scope for HAP

HAP focuses on standardizing the direct communication and self-description of individual agent endpoints. The following areas are considered complementary but separate concerns, to be addressed by other standards or system-level design:

  • Global Agent Discovery: Mechanisms for agents to find the URIs of other agents they aren't already aware of (e.g., via registries or specialized search protocols). HAP defines how an agent describes itself once found.
  • Complex Trust Frameworks: Establishing trust beyond the declared authentication methods (e.g., verifiable credentials, global reputation systems, "on behalf of" delegation). HAP defines how an agent states its auth needs.
  • Multi-Agent Choreography/Orchestration Languages: Standardized languages for defining complex workflows involving multiple agents and their dependencies. HAP provides the bilateral communication links that an orchestrator would use.
  • Global Semantic Vocabularies/Ontologies: While HAP's CDD uses JSON-LD principles to link to vocabularies, the creation and governance of these shared vocabularies (e.g., a universal standard for "summarization skill inputs") are external efforts. HAP enables their use.
  • Specific Implementations of Authentication Providers: HAP defines how an agent declares its required authentication types (e.g., "Bearer token"), not how those tokens are issued or managed by identity providers.
  • Detailed Legal, Ethical Frameworks, and Policy Enforcement: HAP allows linking to such documents, but their content and enforcement are outside the protocol.

HAP's Relationship to A2A and MCP Functionality

HAP aims to provide a simpler, unified foundation that can enable many of the core functionalities seen in both A2A and MCP, while making different trade-offs.

Functionalities HAP Enables (Similar to A2A/MCP)

1. Agent/Tool Self-Description (CDD via GET)
  • Covers: A2A's AgentCard and MCP's initialize response + tools/list, resources/list, prompts/list.
  • HAP's approach: A single GET to the agent endpoint URI returns a comprehensive CDD detailing all capabilities (primary and additional actions), I/O schemas (using JSON Schema), authentication, and how async operations are handled. The use of JSON-LD principles (@context, @type, @id) in the CDD provides rich, linked self-description.
2. Invoking Actions/Skills/Tools (via POST)
  • Covers: A2A's tasks/send and MCP's tools/call.
  • HAP's approach: A single POST method to the agent endpoint URI. The specific action is determined by the optional capability_identifier in the request body (defaulting to the endpoint's primary capability). Arguments are provided in a structured way defined by the CDD's JSON Schemas.
3. Handling Data Resources
  • Covers: MCP's resources/read.
  • HAP's approach: Modeled as a capability. An agent endpoint can offer a "get_resource" capability (e.g., capability_identifier: "#getResource"). The CDD for this capability would specify how to identify the resource. If the resource has its own public URI, the "get_resource" capability might simply return that URI, allowing the client to fetch it directly.
4. Handling Prompt Templates
  • Covers: MCP's prompts/get.
  • HAP's approach: Modeled as a capability. An agent endpoint can offer a "get_prompt_template" capability. The arguments would specify the template and any variables, and the result would be the processed prompt.
5. Asynchronous Operations & Streaming (Basic)
  • Covers: A2A's tasks/sendSubscribe (for basic eventing/completion), TaskStatusUpdateEvent, TaskArtifactUpdateEvent, and push notifications. Also covers the need for MCP's notifications/progress in a simplified way.
  • HAP's approach:
    • Webhooks: 202 Accepted response to a POST includes an operation_id (HTTP URI). Completion/failure is signaled by the agent endpoint making a POST to a client-provided webhook URI.
    • Polling for Status: Clients can GET the operation_id URI to poll for status if the capability supports it (declared in CDD).
    • SSE on Response: The CDD can declare if a capability's direct POST response will be a Server-Sent Event stream for incremental updates.
6. Multi-modal Data Exchange
  • Covers: A2A's Message with Parts (TextPart, FilePart, DataPart) and MCP's TextContent, ImageContent, AudioContent.
  • HAP's approach: The JSON Schemas within the CDD for inputSchema and outputSchema of any capability define how multi-modal data is structured (e.g., using base64 encoding for binary data with a MIME type field, or URIs to external media).

Functionalities Handled Differently or "Missed" by HAP (by design, for simplicity)

1. Explicit Stateful Task Object & Lifecycle Management (A2A)
  • A2A: Has a rich, first-class Task object with a detailed lifecycle (submitted, working, input-required, completed, failed, canceled) and specific methods to manage it (e.g., tasks/get, tasks/cancel).
  • HAP: Manages asynchronous operations via an operation_id URI. While status can be polled (GET <operation_id>) and cancellation requested (POST <operation_id>), it doesn't have the same explicit, detailed intermediate task states (like "input-required") built into the protocol itself unless a capability's specific workflow models it.
2. Server-to-Client RPC (MCP)
  • MCP: Allows servers to make requests back to clients (e.g., sampling/createMessage, roots/list).
  • HAP: Is primarily client-initiated. For a HAP endpoint (server) to make an arbitrary request to another HAP endpoint (client), the "client" must also expose its own HAP endpoint. Webhooks in HAP are for results/notifications of client-initiated operations.
3. Proactive Server-Side Notifications for Capability Changes (MCP)
  • MCP: notifications/.../list_changed allows servers to push updates about their capabilities.
  • HAP: Relies on clients re-fetching/re-validating the CDD using GET with HTTP caching mechanisms (ETag).
4. Fine-grained Streaming/Subscription Management (A2A & MCP)
  • A2A: tasks/resubscribe. MCP: resources/subscribe, resources/unsubscribe.
  • HAP: SSE streams are tied to a single POST response. There's no separate subscribe/unsubscribe for ongoing data feeds or rejoining streams beyond re-initiating the POST or relying on webhooks for discrete updates.
5. Explicit Session Initialization (MCP)
  • MCP: initialize for capability exchange.
  • HAP: Relies on the initial GET of the CDD. No separate session setup beyond standard HTTP.

In Summary for Stakeholders (HAP vs. A2A/MCP)

HAP is a modern, web-friendly protocol designed to make software components (agents, tools, services) more discoverable, understandable, and usable by each other.

  • It's simple: Uses basic web requests (GET to understand, POST to act).
  • It's self-describing: Agents publish rich, machine-readable descriptions of their capabilities (CDDs). These descriptions use well-understood Linked Data principles (like JSON-LD and Schema.org) to define types, properties, and unique identifiers using URIs, making them more meaningful and interoperable.
  • It's flexible: An "agent" can be anything from a single, focused function to a more complex service with multiple related actions, all using the same interaction pattern.
  • It's robust: Built-in URI-based identifiers help manage asynchronous tasks and ensure reliable communication.
  • It's interoperable: By standardizing the "how" of communication and the "style" of description (using web-standard approaches to structured, linked data), HAP allows different systems to connect and collaborate more easily.

Compared to existing protocols like A2A and MCP, HAP offers a more unified and streamlined approach by modeling most interactions as discoverable "capabilities" at a URI. It achieves many of the same goals (tool use, resource access, asynchronous tasks, multi-modal data) but with a simpler set of core protocol rules, pushing more descriptive power into the Capability Description Document. This design prioritizes web-native simplicity and a consistent interaction model.

This approach aims to reduce the complexity of building distributed systems and foster a more dynamic and interconnected ecosystem of automated capabilities, leveraging familiar web standards for data description.

@webr3
Copy link
Author

webr3 commented May 10, 2025

Hyper Agent Protocol (HAP) Schemas

Example: Capability Description Document (CDD)

This is a snippet of a CDD for an Agent Endpoint located at http://my.agent.com/calculator.
The root object of the CDD describes the Agent Endpoint itself. The input and output fields for each action now reference named types (e.g., calc:SumActionInput). These named types would be defined elsewhere (e.g., in a HAP vocabulary or a dedicated types section of the CDD using hap:PropertyDescriptor based structures). These referenced types describe the structure of the object that will be the value of the body field in hap:AgentRequest and hap:AgentResponse messages, respectively. The body object itself will be typed with these referenced types.

{
  "@id": "http://my.agent.com/calculator",
  "@type": "http://hap.dev/vocab#Agent",
  "@context": {
    "hap": "http://hap.dev/vocab#",
    "calc": "http://my.agent.com/calculator/vocab#",
    "remote": "https://your.agent.com/vocab#"
  },
  "name": "Simple Calculator Agent",
  "description": "A HAP-compliant agent that can perform basic calculations.",
  "@actions": [
    {
      "@id": "#",
      "@type": "hap:Action",
      "description": "Default action.",
      "output": "calc:UsageInfoOutput"
    },
    {
      "@id": "#sum",
      "@type": "hap:Action",
      "description": "Adds two numbers 'a' and 'b' and returns their sum.",
      "input": "calc:SumActionInput",
      "output": "calc:SumActionOutput"
    },
    {
      "@id": "https://your.agent.com/calculator#multiply",
      "@type": "hap:Action",
      "description": "Multiplies two numbers 'a' and 'b' (a remote agent action).",
      "input": "remote:MultiplyActionInput",
      "output": "remote:MultipleActionOutput"
    },
    {
      "@id": "https://some.agent.com/actions/divide#",
      "@type": "hap:Action",
      "description": "Divides two numbers 'a' and 'b' (a remote agent default action, similar to a single mcp action."
    }
  ]
}

Example: Invocation of the #sum Action (Synchronous)

Client POST http://my.agent.com/calculator

Request Body:

{
  "@id": "#reqSynchronousXYZ",
  "@type": "hap:AgentRequest",
  "@action": "#sum",
  "body": {
    "@type": "calc:SumActionInput",
    "a": 10,
    "b": 5
  }
}

Example Synchronous Server Response (200 OK):

here the response @id is locally scoped, more details may or may not be available, depending if the Agent defined @actions which allow cancel, status etc

Response Body:

{
  "@id": "#respSynchronousABC",
  "@type": "hap:AgentResponse",
  "request": "#reqSynchronousXYZ",
  "@action": "http://my.agent.com/calculator#sum",
  "body": {
    "@type": "calc:SumActionOutput",
    "total": 15
  }
}

Example: Invocation of the #sum Action (Asynchronous with Webhook via Request @id)

Client POST http://my.agent.com/calculator

Request Body:

{
  "@id": "http://calling.agent.com/webhook#reqXYZ",
  "@type": "hap:AgentRequest",
  "@action": "#sum",
  "body": {
    "@type": "calc:SumActionInput",
    "a": 10,
    "b": 5
  }
}

Server Response 1: Asynchronous Acceptance (202 Accepted)

The @id is a globally scoped uri, so is an Agent - likely with one default @action of # or '#status

{
  "@id": "http://my.agent.com/calculator/responses/respSynchronousABC",
  "@type": "hap:AgentResponse",
  "request": "http://calling.agent.com/webhook#reqXYZ",
  "@action": "http://my.agent.com/calculator#sum",
}

Server Response 2: Webhook Call to Client (Server POST to http://calling.agent.com/webhook)

Webhook Request Body (from HAP Agent Server to Client's Webhook):

{
  "@id": "http://my.agent.com/calculator/responses/respSynchronousABC",
  "@type": "hap:AgentResponse",
  "request": "http://calling.agent.com/webhook#reqXYZ",
  "@action": "http://my.agent.com/calculator#sum",
  "body": {
    "@type": "calc:SumActionOutput",
    "total": 15
  }
}

Note on Type Definitions (e.g., calc:SumActionInput, hap:PropertyDescriptor):

The CDD's input and output fields for an hap:Action reference named types (e.g., calc:SumActionInput). These named types (like calc:SumActionInput) would be defined elsewhere (e.g., in a dedicated "types" section within this CDD, or in an external HAP vocabulary document referenced via @context). Such a type definition would specify its own @type (e.g., hap:ActionInputDescriptor) and list its properties using hap:PropertyDescriptor.

The type referenced by an Action's input field (e.g., calc:SumActionInput) describes the structure of the object that will be the value of the body field in an hap:AgentRequest. This inner body object MUST also be typed with this referenced type (e.g. @type: "calc:SumActionInput"). It is understood that the type referenced by an Action's input field (e.g., calc:SumActionInput) is a semantic subtype of hap:ActionInput.

The type referenced by an Action's output field (e.g., calc:SumActionOutput) describes the structure of the object that will be the value of the body field in an hap:AgentResponse. This inner body object MUST also be typed with this referenced type (e.g. @type: "calc:SumActionOutput"). It is understood that the type referenced by an Action's output field (e.g., calc:SumActionOutput) is a semantic subtype of hap:ActionOutput.

Furthermore, the body property of a hap:AgentRequest itself is understood to expect instances of hap:ActionInput or its subtypes as its value. Similarly, the body property of a hap:AgentResponse expects instances of hap:ActionOutput or its subtypes as its value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment