nazt/2218_API-SURFACE.md

Created April 20, 2026 15:28

Star (2) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/nazt/1ecdbc8bbe558820da8105a123403e1d.js"></script>
Save nazt/1ecdbc8bbe558820da8105a123403e1d to your computer and use it in GitHub Desktop.

Download ZIP

hermes-agent deep /learn — philosophy via issues+commits, testing patterns, API surface (2026-04-20)

Raw

2218_API-SURFACE.md

Hermes Agent — API Surface Document

Project: NousResearch/hermes-agent
Date: 2026-04-20
Scope: Integration surfaces for external systems communicating with Hermes

This document catalogs the points where external systems can integrate with Hermes Agent: CLI commands, MCP server/client modes, ACP (Agent Client Protocol), messaging platforms, plugins, tools, skills, webhooks, cron, and Python embedding.

1. CLI Public Surface

Entry point: hermes command (alias: python cli.py or direct module invocation)
Parser setup: hermes_cli/main.py lines 6335–7820 (argparse subparsers)
Command execution callbacks: hermes_cli/main.py lines 1021–6272 (cmd_* functions)

Major Command Groups

Chat & Sessions

hermes → Interactive REPL (default; no subcommand)
hermes chat → Direct chat query (-q, --prompt)
hermes sessions list → List recent sessions
hermes sessions export → Export session history (JSON/markdown)
hermes sessions delete <session_id> → Remove a session
hermes sessions prune → Delete old sessions (--source, --days)

Model & Auth

hermes model [provider:model] → Switch active model
hermes login <provider> → Authenticate to external services (--scope)
hermes logout <provider> → Remove stored credentials
hermes auth add → Pool credentials (--label, --portal-url, --inference-url)
hermes auth list → List pooled credentials
hermes auth remove <provider> → Remove pooled credential

Gateway & Messaging Platforms

hermes gateway run → Start messaging gateway (Telegram, Discord, Slack, etc.)
hermes gateway start → Start as background service
hermes gateway stop → Stop background service
hermes gateway status → Check platform health (--deep)
hermes gateway setup → Interactive platform configuration wizard
hermes setup → Full system setup wizard

Skills & Toolsets

hermes tools → Configure enabled tools
hermes skills browse → Browse Skills Hub (categories, filters)
hermes skills search <query> → Search for skills (--limit)
hermes skills install <identifier> → Install a skill from registry
hermes skills list → Show installed skills
hermes skills tap → Manage skill sources (GitHub repos)

Plugins

hermes plugins install <url|name> → Add a plugin
hermes plugins list → Show installed plugins
hermes plugins enable <name> → Activate a plugin
hermes plugins disable <name> → Deactivate a plugin
hermes plugins remove <name> → Uninstall a plugin

Memory & Context

hermes memory → View and manage persistent memory (--export, --import)

Scheduled Tasks

hermes cron list → List all scheduled jobs (--all for disabled)
hermes cron create → Add a new scheduled task (--name, --deliver)
hermes cron edit <job_id> → Modify schedule or prompt
hermes cron pause <job_id> → Suspend a job
hermes cron resume <job_id> → Resume a paused job
hermes cron run <job_id> → Trigger immediately
hermes cron remove <job_id> → Delete a job

Webhooks

hermes webhook subscribe <name> → Register webhook route (--channel, --description)
hermes webhook list → Show active subscriptions
hermes webhook remove <name> → Unsubscribe
hermes webhook test <name> → Send test payload

MCP Integration

hermes mcp add <name> → Register MCP server (--command, --url, --auth)
hermes mcp list → Show configured servers
hermes mcp test <name> → Verify connectivity
hermes mcp remove <name> → Deregister server

System Commands

hermes config set <key> [value] → Set config option
hermes config show → Display current config
hermes config migrate → Upgrade config schema
hermes status → Show agent health
hermes doctor → Diagnose issues (dependencies, config, auth)
hermes version → Display version info
hermes update → Upgrade to latest release
hermes backup → Create encrypted backup archive
hermes import <zipfile> → Restore from backup

2. MCP Server Mode

Hermes as MCP server — exports tools to external MCP clients.
Entry: hermes mcp serve (planned; currently tools exposed via ACP only)
Module: tools/mcp_tool.py (currently MCP client only; server mode being added)

Current State: MCP Client Only

Hermes consumes external MCP servers via the config:

mcp_servers:
  filesystem:
    command: "npx"
    args: ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
    timeout: 120
  github:
    url: "https://mcp-server.example.com/mcp"
    headers:
      Authorization: "Bearer sk-..."

Client code: tools/mcp_tool.py lines 1–300 (connection + discovery)

Stdio transport: command + args (spawn subprocess)
HTTP/StreamableHTTP: url + optional headers and auth
Tool discovery: MCP ListTools → agent registry injection with namespace prefix
Authentication: oauth and header modes for paid MCP endpoints
Sampling: MCP servers can request LLM completions back to Hermes (configurable limits)

MCP Configuration: hermes_cli/mcp_config.py lines 1–400 (parsing, validation, auth setup)

3. MCP Client Mode

Config file: ~/.hermes/config.yaml → mcp_servers section
Client implementation: tools/mcp_tool.py lines 100–600 (async client loop)

Authentication Methods

OAuth (--auth oauth)
- Redirects to provider OAuth endpoint
- Stores refresh token in ~/.hermes/secrets/mcp-<name>.json
- Automatic token refresh before expiration
Header-based (--auth header)
- Static bearer token or custom header in request
- Stored in config or env var (e.g., MCP_<NAME>_TOKEN)
Environment variables (default)
- Resolved from shell env; no persistent storage
- Suitable for ephemeral containers

Tool Namespacing

MCP tools are prefixed with their server name:

Server github with tool search_issues → github:search_issues (internal)
Display name: "Search Issues (github)" in UI

Namespace collision resolution: Last-registered server wins (reload order: bundled → user → config)

4. ACP (Agent Client Protocol) Server

Purpose: Editor integration (VS Code, Zed, JetBrains, Cursor, Windsurf)
Server code: acp_adapter/server.py (full ACP 0.9.0 spec)
Lifecycle entry: acp_adapter/entry.py (server startup)

Key Components

Component	File	Purpose
`HermesACPAgent`	`acp_adapter/server.py:95`	Main ACP agent class (subclass of `acp.Agent`)
`SessionManager`	`acp_adapter/session.py`	Manages editor session state, tool calls, streaming
`MessageHandler`	`acp_adapter/events.py`	Converts agent events → ACP protocol messages
Permissions	`acp_adapter/permissions.py`	Command approval callback (approval workflow)

ACP Messages Handled

initialize() → Returns agent capabilities (models, tools, MCP servers)
new_session() → Create new chat session
load_session(session_id) → Restore saved conversation
fork_session() → Branch conversation
send(session_id, prompt, mode) → Process user query (mode = chat, task, architect)
set_session_model() → Switch LLM mid-session
set_session_config_option() → Update session settings
cancel_task() → Interrupt current work
list_sessions() → Browse saved chats
/slash commands → Exposed via _SLASH_COMMANDS dict (/help, /model, /memory, /tools)

Streaming Protocol

Agent responses stream as protocol events:

message_chunk — LLM text tokens (progressive rendering)
tool_call_start / tool_call_complete — Tool execution events
step_complete — Full turn finished
usage — Token count + cost estimate

5. Gateway: Messaging Platform Adapters

Base class: gateway/platforms/base.py line 887 (BasePlatformAdapter)
Directory: gateway/platforms/ (27 adapters)
Factory: gateway/run.py (_create_adapter())
Config: ~/.hermes/config.yaml → platforms section

Supported Platforms (15+)

Platform	File	Status
Telegram	`telegram.py`	Full support (groups, channels, PM, media)
Discord	`discord.py`	Full (threads, slash commands, reactions)
Slack	`slack.py`	Full (threads, message updates, blocks)
WhatsApp	`whatsapp.py`	Full (via WhatsApp Business API)
Signal	`signal.py`	Full (via signald daemon)
Weixin (WeChat)	`weixin.py`	Full (groups, official accounts, mini programs)
WeChat Enterprise	`wecom.py`	Full (message callbacks, approvals)
Feishu (Lark)	`feishu.py`	Full (doc creation, file management)
DingTalk	`dingtalk.py`	Full (robot messages, card interaction)
Mattermost	`mattermost.py`	Full (self-hosted Slack alternative)
Matrix	`matrix.py`	Full (Synapse homeserver)
QQ Bot	`qqbot/`	Full (QQ groups and DMs)
Email	`email.py`	Limited (inbound IMAP)
SMS	`sms.py`	Limited (Twilio provider)
Home Assistant	`homeassistant.py`	Limited (notification delivery only)
Webhook	`webhook.py`	Generic HTTP POST subscriptions
BlueBubbles	`bluebubbles.py`	iMessage relay from macOS

Required Methods (All Adapters)

class BasePlatformAdapter(ABC):
    def __init__(self, config: PlatformConfig, platform: Platform):
        """Parse config, initialize state."""
    
    async def connect(self) -> bool:
        """Establish connection. Return True on success."""
    
    async def disconnect(self) -> None:
        """Stop listeners, close connections."""
    
    async def send(self, chat_id: str, text: str, **opts) -> SendResult:
        """Send text message. Return success/failure + message_id."""
    
    async def send_typing(self, chat_id: str) -> None:
        """Send typing indicator (ephemeral)."""
    
    async def send_image(self, chat_id: str, url: str, caption: str) -> SendResult:
        """Send image from URL."""
    
    async def get_chat_info(self, chat_id: str) -> dict:
        """Return {name, type, chat_id, members...}."""
    
    async def handle_message(self, event: MessageEvent) -> None:
        """Process inbound message (called by adapter internally)."""

Optional Methods

send_document(chat_id, file_path, caption) — File attachment
send_voice(chat_id, file_path) — Audio message
send_video(chat_id, file_path, caption) — Video
send_animation(chat_id, file_path, caption) — GIF/animation
send_image_file(chat_id, file_path, caption) — Local image

Adding a New Platform

See gateway/platforms/ADDING_A_PLATFORM.md (8,826 bytes). Key steps:

Create adapter → gateway/platforms/<platform>.py, subclass BasePlatformAdapter
Add enum → gateway/config.py, extend Platform enum
Register in factory → gateway/run.py, add case in _create_adapter()
Add auth map → gateway/run.py, if using custom auth (OAuth, tokens)
Add CLI setup → hermes_cli/main.py, subcommand for platform-specific config
Implement message routing → Use self.build_source() for session keys
Handle media → Use cache_image_from_bytes(), cache_audio_from_bytes() for attachments
Logging → Redact secrets in all log output

6. Plugin System

Plugin storage: ~/.hermes/plugins/<name>/ (user), <repo>/plugins/<name>/ (bundled)
Manifest: Each plugin requires plugin.yaml + __init__.py
Discovery: hermes_cli/plugins.py lines 1–300
Hook execution: hermes_cli/plugins.py lines 400–600 (invoke_hook())

Plugin Manifest (`plugin.yaml`)

name: my-plugin
version: 1.0.0
description: What this plugin does
author: Your Name
requires_env:
  - SOME_API_KEY
  - secret_token: "OPTIONAL_SECRET"
provides_tools:
  - custom_tool_name
provides_hooks:
  - pre_llm_call
  - post_tool_call

Plugin Entry Point (`init.py`)

def register(ctx: PluginContext):
    """Called once during plugin load."""
    ctx.register_tool(my_tool)
    ctx.on("pre_llm_call", my_hook_handler)

Valid Hooks

Hook	Fired When	Signature
`pre_tool_call`	Before tool execution	`(tool_name, args, **kwargs)`
`post_tool_call`	After tool returns	`(tool_name, result, **kwargs)`
`transform_tool_result`	Transform tool output	`(result: str) -> str`
`pre_llm_call`	Before model inference	`(messages, model, **kwargs)`
`post_llm_call`	After model response	`(response, **kwargs)`
`transform_terminal_output`	Reformat terminal output	`(output: str) -> str`
`pre_api_request`	Before HTTP request	`(method, url, **kwargs)`
`post_api_request`	After HTTP response	`(response, **kwargs)`
`on_session_start`	Session begins	`(session_id, **kwargs)`
`on_session_end`	Session closes	`(session_id, **kwargs)`
`on_session_finalize`	Before persistence	`(session_id, messages, **kwargs)`
`on_session_reset`	Clear session	`(session_id, **kwargs)`

Plugin Loading

Bundled plugins (<repo>/plugins/*/) + excluded subdirs (memory/, context_engine/)
User plugins (~/.hermes/plugins/*/)
Project plugins (./.hermes/plugins/*/, opt-in via HERMES_ENABLE_PROJECT_PLUGINS)
Pip entry-point plugins (exposed via hermes_agent.plugins entry group)

Later sources override earlier ones (name collisions).

7. Tool & Toolset Interface

Tool registry: tools/ directory (40+ built-in tools)
Tool schema: Pydantic models + docstring introspection
Discovery: hermes_cli/tools_config.py (registry enumeration)

Standard Tool Pattern

# In tools/custom_tool.py
from typing import Annotated
from pydantic import BaseModel, Field

class CustomToolInput(BaseModel):
    query: str = Field(..., description="Search query")
    limit: int = Field(10, description="Max results")

def custom_tool(input: Annotated[CustomToolInput, "Tool name"]) -> str:
    """Tool description.
    
    Long description for system prompt (markdown).
    """
    # implementation
    return result

Tool Schema Generation

Docstring parsing → Extract description
Type hints → Build JSON Schema from Pydantic models
Field descriptions → Pulled from Field(..., description=...)
Registry injection → Tool available as both:
- Built-in (direct function call)
- MCP-prefixed (if exposed via server mode)
- Plugin-registered (dynamic at runtime)

Built-in Toolsets

Toolset	Tools	Enabled By Default
`web`	web_search, web_tools, browser	Yes
`terminal`	bash, python, code_execution	Conditional (sandbox)
`files`	file_operations, file_tools	Yes
`vision`	image_analysis, screenshot	Conditional (vision models)
`audio`	tts, speech_recognition	Conditional (audio hardware)
`image_gen`	image_generation	Conditional (API keys)
`code`	git_tools, github_tools	Conditional (auth)

8. Skill Interface

Skill storage: ~/.hermes/skills/<category>/<skill-name>/ (user), <repo>/skills/ (bundled)
Manifest: SKILL.md (frontmatter + markdown)
Discovery: hermes_cli/skills_hub.py (hub + local detection)

SKILL.md Frontmatter

---
name: skill-identifier
description: One-line summary
version: 1.0.0
author: Author Name
license: MIT
metadata:
  hermes:
    tags: [python, deployment, ci-cd]
    related_skills: [other-skill-id, ...]
    requires: [python-3.11, docker]
    cost: "high"  # or "medium", "low", "free"
---

Skill Loading Modes

Pre-armed — Bundled + user skills loaded at startup (in system prompt)
Lazy-loaded — Hub skills loaded on-demand (/skill-name command)
Indexed — FTS search across all available skills (local + Hub)

Skill Categories (Standard Convention)

skills/
├── github/
│   ├── github-auth/
│   ├── github-pr-workflow/
│   ├── github-code-review/
│   └── github-issues/
├── productivity/
│   ├── calendar-integration/
│   ├── task-automation/
│   └── email-management/
├── research/
│   ├── arxiv-search/
│   └── literature-summary/
└── ...

Skill Interaction

Users invoke skills via:

/skill-name — Load and execute skill
/skills — Browse available skills
/skills search <query> — FTS search
Auto-loading when agent detects task matches skill domain

9. Webhook Subscriptions & Cron Scheduling

Webhook Subscriptions

Storage: ~/.hermes/webhook_subscriptions.json
Config: hermes_cli/webhook.py lines 1–200
Platform adapter: gateway/platforms/webhook.py (HTTP listener)

hermes webhook subscribe my-github-events \
  --channel telegram:123456 \
  --description "GitHub push notifications"

Subscription object:

{
  "name": "my-github-events",
  "channel": "telegram:123456",
  "route": "/webhooks/my-github-events",
  "secret": "hmac-secret-auto-generated",
  "description": "GitHub push notifications",
  "created_at": "2026-04-20T...",
  "last_received": "2026-04-20T..."
}

Delivery: Hot-reloaded without gateway restart; webhook platform listens on host:port.

Cron Scheduling

Storage: ~/.hermes/cron/jobs.json
Scheduler daemon: cron/scheduler.py (background process)
CLI interface: hermes_cli/cron.py + cron/jobs.py

hermes cron create \
  --schedule "0 9 * * *" \
  --prompt "Generate daily standup report" \
  --deliver slack:#reports

Job object:

{
  "id": "job-uuid",
  "name": "daily-standup",
  "schedule": {"type": "cron", "value": "0 9 * * *"},
  "prompt": "Generate daily standup report",
  "deliver": ["slack:#reports", "local"],
  "enabled": true,
  "skills": ["productivity/task-summary"],
  "script": null,
  "next_run_at": "2026-04-21T09:00:00Z",
  "last_run_at": "2026-04-20T09:00:00Z",
  "last_status": "success",
  "repeat": {"times": null, "completed": 0}
}

Delivery modes:

local — Message to agent home chat (CLI or primary messaging platform)
<platform>:<channel> — Route to specific platform/group
email:<recipient> — Email delivery
webhook:<url> — POST result to external webhook

10. Python API for Embedding

Status: Hermes is primarily a CLI tool; Python embedding is unsupported in the public API.

Partial Python Access (Internal Use Only)

The codebase contains internal Python modules that could be imported:

# Not officially supported; API may change between versions
from acp_adapter.session import SessionManager
from agent.anthropic_adapter import AnthropicAdapter  # or other model adapters
from gateway.platforms.base import BasePlatformAdapter
from hermes_cli.config import load_config

Recommended Path (Official)

For embedding Hermes in Python applications:

Use ACP client library (if available)
- Connect to running Hermes ACP server
- Send prompts via protocol
- Receive streamed responses

Subprocess mode

import subprocess
import json

result = subprocess.run(
    ["hermes", "chat", "-q", "your prompt"],
    capture_output=True, text=True
)
# parse stdout

Webhook callback
- Have cron or gateway POST results to your API
- Query Hermes via HTTP (webhook subscriber)

Summary of Integration Points

Integration	Type	Entry Point	Config
CLI	Commands	`hermes <cmd>`	`hermes_cli/main.py`
MCP Client	Protocol	`mcp_servers` in config.yaml	`hermes_cli/mcp_config.py`
ACP Server	Protocol	Editor → localhost:5000	`acp_adapter/`
Gateway	Messaging	`hermes gateway run`	`~/.hermes/config.yaml` → `platforms`
Plugins	System	`~/.hermes/plugins/<name>/`	`plugin.yaml` + `__init__.py`
Tools	Functions	`tools/` directory	`tools/` + toolset config
Skills	Docs	`~/.hermes/skills/`	`SKILL.md` frontmatter
Webhooks	HTTP	`/webhooks/<name>`	`webhook_subscriptions.json`
Cron	Scheduling	`hermes cron ...`	`~/.hermes/cron/jobs.json`

Document compiled: 2026-04-20 (Saturday)
Scope: Medium thoroughness — covers all major integration surfaces with file:line citations for implementation details.

Raw

2218_ISSUES-COMMITS.md

Hermes Agent: Philosophy from Issues, Commits, and Design Artifacts

Date: April 20, 2026 | Scope: Commit history (last 100), Issues (open + closed, recent 30), Releases (v0.6–v0.10), Documentation (README, CONTRIBUTING, SECURITY, AGENTS.md)

1. Release Rhythm: Every 1–5 Days, Hyperactive Iteration

Cadence: v0.2.0 (Mar 12) → v0.10.0 (Apr 16) = 8 releases in 35 days = ~4-day average interval.

What warrants a release:

Single major feature gets shipped (Profiles in v0.6, Memory providers in v0.7, Tool Gateway in v0.10)
50–180+ commits bundled; 16–60+ issues resolved per release
Each release has explicit highlights + 🏗️/📱/🔧 subsections organized by component

Changelog voice: Declarative, technical, evidence-linked. Every highlight cites PR numbers. Example from v0.10:

"Nous Tool Gateway — Paid Nous Portal subscribers now get automatic access to web search (Firecrawl), image generation (FAL / FLUX 2 Pro), text-to-speech (OpenAI TTS), and browser automation (Browser Use) through their existing subscription. No separate API keys needed."

No filler. No "improvements and bug fixes." Ship notes read like a technical spec, not marketing copy. This signals: ship fast, justify by impact, keep a running record.

2. Commit Message Discipline: Conventional Commits + Narrative Why

Pattern: Consistently type(scope): what across 100-commit sample.

Examples:

fix(tui): fix Linux Ctrl+C regression, remove double clipboard write
fix(agent): repair malformed tool_call arguments before API send
feat(plugins): make all plugins opt-in by default
chore(release): add jplew to AUTHOR_MAP

Observations:

Scope-first design: fix(tui), fix(gateway), feat(skills), chore(release) — tells you impact zone immediately
Atomic changes: Most commits are single-fix or single-feature; few merges visible in recent history (dcd763c Merge pull request #10125)
No narrative why-statements in commit bodies (visible log is subject-only) — suggests history is self-documenting via: (a) type prefix signals risk (fix < feat < chore), (b) scope signals blast radius, (c) PR numbers link to discussion
Author mapping: chore(release): add [name] to AUTHOR_MAP appears regularly — deliberate credit ritual, tribe-building signal

Philosophy revealed: Speed + clarity. Conventional Commits enforce scannability. Atomic commits enable safe rollback and bisect. No narrative prose in message bodies = trust the code to speak for itself; discussions live in PRs, not commit messages.

3. Issues: What the Team Argues About

Sample: 30 recent (open + closed) issues, April 20, 2026.

Themes & frequency:

Theme	Count	Examples	Signal
Provider/model support	8	#13061 (normalize_model_name breaks custom providers), #13042 (Ollama glm-5.1 malformed JSON), #13031 (Feishu gateway tool execution), #12835 (kimi-k2.5 temperature mode), #12790 (max_tokens fallback incomplete)	Polyglot-first: Hermes prioritizes multi-provider compatibility above all. Every model, every endpoint variant, must work. Bugs here are critical.
Gateway reliability	6	#13081 (socket directory glob), #13050 (Discord username surfacing), #13033 (Linux terminal freeze on paste), #13027 (skill_view HERMES_SESSION_PLATFORM check), #12868 (plugin loading post-restart)	Always-on assumption: Messaging gateways are production infrastructure, not toys. Freezes, race conditions, and session loss are P0.
Skills & autonomy	5	#13075 (memory/skill nudge counter), #13060 (smalltalk async subagents), #13041 (delegate_task idling), #13028 (_save_platform_tools stale state)	Learning loop is core: Autonomous skill creation and subagent delegation are not nice-to-haves; failures here block the main thesis.
Config & migration	4	#13024 (setup wizard misclassifies providers), #13025 (OpenClaw migration stale checks), #12881 (UnicodeDecodeError on update)	Low-entropy user onboarding: Team invests heavily in setup, migration, config clarity — new users should not hit Python tracebacks.
Session/memory search	3	#13056 (session search time-bounded queries), #13079 (read_file dedup cache pollution)	FTS5 is central: Full-text search and session recall are differentiators; bugs here are visibility breaches.
Security & approval	2	(Mostly closed; no current open issues)	Mature stance: Security is baked in; few incoming reports suggest approval system + secret redaction are working.
Docs & help	0		Docs debt managed silently: No open doc issues; docs are fixed in PRs alongside features.

What gets rejected/closed-wontfix: None visible in sample. Instead, issues get rapid triage and hotfix PRs. Example: #13059 (Chinese: custom provider model ID corruption) opened Apr 20 14:19, closed 14:25 — 6-minute fix cycle. Triaging for speed, not gatekeeping.

4. Pull Requests: Merge Pattern & Reviewer Philosophy

Sample: 20 recent PRs (all open, i.e., pending review/merge).

Patterns:

Pattern	Count	Examples	Signal
Bug fix + test	10	#13080 (file-tools TERMINAL_CWD), #13078 (null/scalar in get_disabled_skills), #13077 (stream consumer first message), #13073 (transport types + Anthropic normalize)	Tests-as-spec: Fixes include test additions. Reviewers validate via test changes, not code inspection alone.
Feature with integration	6	#13082 (signet crypto audit trail), #13070 (Docker env var overrides), #13066 (Feishu media delivery), #13063 (Discord history backfill)	Completeness bar: Features ship with full integration: config, docs, tests, example commands. Partial implementations rejected.
Cross-platform compat	3	#13064 (right-click paste), #13074 (QQCloseError backoff), #13073 (transport types)	"Works on my machine" is not acceptable: Platform divergence (macOS/Linux/Windows, TUI/CLI/gateway, all providers) is actively hunted and fixed.
Reasoning/thinking model support	2	#13076 (api_server load reasoning config), #13071 (Copilot ACP async + streaming)	Extended thinking is first-class: o1-like models are treated as language-level feature, not afterthought.
Closed (rejected/duplicate)	1	#13069 (duplicate of #13070)	Deduplication happens before full review. Prevents churn; suggests strong issue triage discipline.

No visible rejection comments. Open PRs are either in the merge queue or waiting for CI/feedback. No evidence of "we won't take this" — instead, community PRs get integrated or redirected to Skills Hub.

5. Philosophy Consistency Check: Stated vs. Revealed

README's Claims

"The only agent with a built-in learning loop — it creates skills from experience, improves them during use, nudges itself to persist knowledge, searches its own past conversations, and builds a deepening model of who you are across sessions."

Evidence in artifacts:

Commit: feat(plugins): convert disk-guardian skill into a bundled plugin (068b2248) — skills auto-generated and bundled
Issue #13075: "Memory/Skill Nudge Counter Issues" — nudging is a tracked, prioritized feature
Release v0.7: "Pluggable Memory Provider Interface" + Honcho integration — memory is extensible
Session search: #13056 (time-bounded queries), FTS5 in hermes_state.py — past conversation search is core
✅ Claim validated.

README's Claims

"Run it on a $5 VPS, a GPU cluster, or serverless infrastructure... It's not tied to your laptop."

Evidence in artifacts:

Release v0.6: "Profiles — Multi-Instance Hermes", "Docker Container", "Modal and Docker container skills/credentials mounting"
Release v0.7: "API Server Session Continuity" for Open WebUI integration
Release v0.6: "Feishu/Lark + WeCom Platform Support" (multiple messaging platforms)
Commit: feat(whatsapp): implement send_voice (ed76185c) — multi-platform audio support
✅ Claim validated.

CONTRIBUTING.md's Priority Ladder

Bug fixes (crashes, incorrect behavior, data loss)

Cross-platform compatibility

Security hardening

Performance & robustness

New skills

New tools

Documentation

Evidence in artifacts:

Open issue #13081: "glob pattern doesn't match socket directories, causing daemon startup failure" — crash prioritized
Open issue #13033: "setup freezes terminal on Linux" — cross-platform compat bug is open
Closed issue #12881: "UnicodeDecodeError on config migration" — fixed within 24h, crash behavior
Release v0.7: "Gateway Hardening" section (5 PRs for race conditions, flood control, compression death spirals)
Release v0.7: "Security: Secret Exfiltration Blocking" (secret patterns, credential directory protections)
Commits show: mostly fix(*) and occasional feat(*), rarely docs(*)
✅ Hierarchy is operational, not aspirational.

SECURITY.md's Trust Model

"Single Tenant: The system protects the operator from LLM actions, not from malicious co-tenants."

Evidence in artifacts:

Approval system is configurable: approvals.mode: "on" (default), "auto", "off" — operator choice, not enforcement
Issue #13056: Session search lacks time-bounded queries — user owns their own search results
Commits: fix(agent): repair malformed tool_call arguments (9eeaaa4f) — LLM output validation is proactive
✅ Trust model aligns with single-user, self-service assumption.

AGENTS.md's Developer Ethic

"AIAgent class — core conversation loop, tool dispatch, session persistence" (minimal docs, code is canonical)

Evidence in artifacts:

AGENTS.md is light on narrative, heavy on code paths and class signatures
git log shows atomic, type-prefixed commits — code structure must be self-evident
Release notes cite file changes + PR numbers, not architecture essays
✅ Assumes developers read code-first, prose second.

6. The One Thing Only Hermes Would Say

Claim: "Hermes is the only agent with a closed learning loop that works across all platforms and all models simultaneously, without lock-in."

Breakdown:

Closed learning loop: Autonomous skill creation from conversations (#13075 nudge counter), FTS5 session search (#13056), procedural memory with pluggable backends (v0.7), subagent delegation with memory isolation (SECURITY.md). No other agent framework bundles all three.
Works across all models & providers simultaneously: 100+ commits in last 30 days are provider/model-specific fixes (normalize_model_name, max_tokens fallback, temperature mode detection, context length resolution). v0.7 release alone adds 6 new provider patterns (Anthropic long-context tier 429, Fireworks context detection, DashScope international, Bearer auth for MiniMax). This is obsessive-compulsive model compatibility work, not an afterthought. Other frameworks pick 2–3 providers and call it done.
Works across all platforms: Telegram (webhook + polling), Discord (multi-workspace OAuth), Slack, WhatsApp, Signal, Matrix, Mattermost, Feishu/Lark, WeCom, Home Assistant, Email, CLI, TUI, API server. v0.6 added Feishu + WeCom in a single release. No multi-platform agent does this.
Zero lock-in: Can switch models via hermes model [provider:model]; can swap providers via fallback chains; can migrate memory via pluggable providers (Honcho as official reference implementation); can run skills locally or on Modal/Docker; can deploy to any terminal backend. README explicitly names 10+ inference endpoints. This is not marketing; every claim is enforced by tests and releases.

Evidence:

Commit cadence: 100+ fixes in last 100 commits = tight feedback loop on real breakage, not wishful features.
Issue triage: #13059 (custom provider corruption) fixed in 6 minutes = production incident response, not hobby project.
Release velocity: v0.6–v0.10 in 35 days, each release with 50–180+ commits = team moving fast on observable friction.
Architecture: Single AIAgent class, toolsets.py for platform abstraction, tools/registry.py for dynamic tool discovery, hermes_state.py for unified session storage = no special cases, no platform-specific forks.

Why this claim is defensible:

Claude Desktop (LLM IDE plugin) and ChatGPT+ (web-only). Copilot (Microsoft-locked). Cursor (editor-only). All are single-platform, single-model. Hermes refuses single-platform constraints at a code-level.
OpenRouter, Together, Anyscale do provider aggregation, but only for inference. They don't ship a conversation UI, session search, memory system, skill creation, multi-platform gateway, terminal backend abstraction, and approval system that all work together across providers.
The learning loop (skills + FTS5 search + nudges) is novel to Hermes. Retrieval-augmented generation is standard; procedural skill generation from execution traces is not.

7. What the Issues Reveal About Team Pressure & Resistance

Community pressure vs. Team decisions:

Issue	Status	Team Response	Signal
#13049 (XMPP channel)	OPEN	No response yet	Team doesn't auto-accept every platform request. XMPP is niche; Hermes gates on "broadly useful" (CONTRIBUTING.md).
#13041 (delegate_task idle timeout)	OPEN	Waiting for fix	Subagent autonomy is broken; team is aware and triaging. Not deprioritized.
#13060 (smalltalk async subagents)	OPEN	Feature request labeled	Community asking for async subagent UX (side conversations). Team tracking but not committed.
#13072 (CLI auto-queue mode)	OPEN	Feature request	Users want queueing behavior; team hasn't shipped yet. Not a priority vs. crashes.
v0.7 Pluggable Memory	RELEASED	Shipped Honcho integration	Community pressure on memory backends → formalized provider ABC → released. Team listened.
v0.6 Ordered Fallback Providers	RELEASED	Shipped, closes #1734	Issue #1734 is 10+ months old; finally resolved. Team prioritizes based on data, not age.

Team stance: Hear community, ship high-ROI fixes (crash bugs, provider compat), gate speculative features (XMPP, auto-queue). Not dismissive; not obligated to every request.

Summary: The Observable Philosophy

Pillar	Evidence	Implication
Speed over perfection	4-day release cycle, 6-minute hotfix triage, 100+ commits per release	Iterate fast, break stuff carefully (with tests), learn from production
Compatibility is correctness	8+ open provider bugs, v0.7 release has 20+ provider-specific fixes	Support all models, all endpoints, all variants. Parity across providers is a feature, not a bug list.
Autonomy is the goal	Learning loop (skills + memory + search), subagent delegation, FTS5 session recall are prioritized	Agent should know what it learned; should improve over time; should think independently. User is collaborator, not command source.
Multi-platform is mandatory	10+ messaging platforms, multiple terminal backends, serverless + container + local execution	Hermes is not a CLI tool. It's a distributed agent runtime that happens to have a CLI.
Trust is individual	Single-tenant model, operator-owned approval gates, credential isolation, sandbox defaults	Hermes trusts you, not the cloud. You own your keys, your sessions, your skills.
Code is the spec	Light documentation, atomic commits, type-prefixed scope, PR-driven discussions	Read the code. Run the tests. Ship it. Narratives are secondary to executable truth.

One final signal: The release notes quote PR numbers obsessively. Example: v0.7 has 50+ PR citations in 20 lines of highlights. This is radical transparency. Readers can click any claim and see the implementation. Other projects write prose; Hermes writes audit trails.

Raw

2218_TESTING.md

Hermes Agent Testing Patterns

1. Test Structure

Test suite: 673 Python files across 16 subfolders. Total: ~200K LOC of test code.

Directory layout:

tests/gateway/ (173 files, 66K LOC) — platform adapters (Telegram, Discord, Slack, Matrix, API server)
tests/tools/ (149 files, 51K LOC) — tool execution, skill manager, code execution, MCP
tests/hermes_cli/ (119 files, 36K LOC) — CLI behaviors, commands, config parsing
tests/run_agent/ (55 files, 18K LOC) — agent loop, message handling, context compression
tests/agent/ (44 files, 19K LOC) — LLM adapters (Anthropic, OpenAI, Bedrock, etc.)
tests/cli/ (44 files, 9K LOC) — older CLI entry points
tests/integration/ (8 files) — batch runner, checkpoint resumption, voice channels
tests/e2e/ (1 file, +conftest) — full gateway pipeline command dispatch
tests/skills/, tests/plugins/, tests/cron/ — optional integrations

Test-to-source coverage ratios (LOC test / LOC source):

gateway: 1.24× (most heavily tested)
run_agent: 1.45× (agent loop critical path)
tools: 1.10× (broad tool coverage)
agent: 0.90× (LLM adapter coverage)
hermes_cli: 0.70× (CLI underdeveloped)
skills: 0.29× (marked as lightly tested)

Undertested: skills/ directory (only 6 tests for 7K LOC), optional integrations (honcho, daytona, modal).

2. Test Framework & Runner

Framework: pytest 9.0+, pytest-asyncio, pytest-xdist (parallel -n auto)

Configuration (pyproject.toml:131-136):

[tool.pytest.ini_options]
testpaths = ["tests"]
markers = ["integration: marks tests requiring external services"]
addopts = "-m 'not integration' -n auto"

CI/CD (.github/workflows/tests.yml):

Single job runs all unit tests (excluding integration/, e2e/) on Ubuntu 3.11
Separate e2e job runs tests/e2e/ only (marked @pytest.mark.asyncio)
Python version: 3.11 only (no 3.12 matrix)
Timeout: 20min for unit tests, 10min for e2e
Coverage config: None (no explicit .coverage, no CI report)

Dependencies:

pytest>=9.0.2,<10
pytest-asyncio>=1.3.0,<2 (async test harness)
pytest-xdist>=3.0,<4 (parallel execution)
No pytest-cov, no coverage thresholds enforced in CI

Test isolation (hermetic environment, conftest.py:192-338):

All credential env vars (API keys, tokens, passwords) unset per test
HERMES_HOME redirected to per-test tmpdir (prevents ~/.hermes leakage)
TZ=UTC, LANG=C.UTF-8, PYTHONHASHSEED=0 (deterministic datetime/locale)
AWS IMDS disabled (avoids 2s metadata service timeout)
Plugin singleton reset between tests
30-second per-test timeout (SIGALRM on Unix, no-op on Windows)

3. Mocking & Fixtures Patterns

LLM mocking strategy:

Custom doubles (no responses/respx/pytest-httpx). E.g., restart_test_helpers.py:71-108 manually constructs GatewayRunner with:
- runner._update_runtime_status = MagicMock()
- runner.hooks.emit = AsyncMock()
- runner.session_store = MagicMock() with ._entries = {}
Mock uses unittest.mock (standard library)
LLM calls are stubbed at the adapter level, not the HTTP layer

External service stubs:

Telegram: sys.modules mock (pre-test, gateway/conftest.py:21-62) provides fake ChatType constants, error classes
Discord: comprehensive sys.modules mock (gateway/conftest.py:65-142) covering discord.Intents, discord.app_commands group/command registration
Platform adapters: RestartTestAdapter (base class for Telegram mock) overrides send(), connect(), disconnect(), get_chat_info()

Fixture patterns (conftest.py):

_hermetic_environment (autouse) — environment isolation
tmp_dir(tmp_path) — per-test temp directory
mock_config() — minimal hermes config dict (model, toolsets, terminal backend)
_ensure_current_event_loop() (autouse) — creates event loop for sync tests calling asyncio.get_event_loop().run_until_complete()
_enforce_test_timeout() (autouse) — 30s SIGALRM

Shared fixtures per subsystem:

tests/gateway/conftest.py — telegram/discord sys.modules mocks
tests/run_agent/conftest.py — 34 lines of undocumented helpers
tests/e2e/conftest.py — 266 lines covering full gateway setup

Snapshot testing: NOT used. No pytest-snapshot, no golden files.

4. What They Test HARD (3 Most-Tested Modules)

4a. gateway/api_server (1.24× coverage ratio)

Heavy test focus on OpenAI-compatible API server multimodal routing.

Example: tests/gateway/test_api_server.py:55-100 (ResponseStore LRU eviction):

class TestResponseStore:
    def test_lru_eviction(self):
        store = ResponseStore(max_size=3)
        store.put("resp_1", {"output": "one"})
        store.put("resp_2", {"output": "two"})
        store.put("resp_3", {"output": "three"})
        store.put("resp_4", {"output": "four"})
        assert store.get("resp_1") is None  # evicted (least recently used)
        assert store.get("resp_2") is not None
        assert len(store) == 3

Invariant defended: LRU cache correctness (response chaining via previous_response_id).

173 gateway test files defend:

Session routing across platforms (Telegram, Discord, Slack, Matrix, API server)
Message queueing during agent busy state
Approval/deny workflow authorization
Multi-platform skill registration

4b. tools/ (1.10× coverage)

149 files test tool execution safety and skill mutation.

Example: tests/tools/test_skill_improvements.py:46-68 (fuzzy patch skill):

def test_whitespace_trimmed_match(self):
    skill = "---\nname: ws-skill\n\n    def hello():\n        print(\"hi\")"
    _create_skill("ws-skill", skill)
    # Patch with no leading whitespace (LLM output shape)
    result = _patch_skill("ws-skill", "def hello():\n    print(\"hi\")", 
                          "def hello():\n    print(\"hello world\")")
    assert result["success"] is True
    content = (self.skills_dir / "ws-skill" / "SKILL.md").read_text()
    assert 'print("hello world")' in content

Invariant defended: Skill mutation is whitespace-agnostic (LLMs produce indentation variance).

Key test areas:

Skill creation/patching with fuzzy matching (7 test files)
Code execution modes (POSIX-only, Windows workarounds)
File sync performance over SSH
Security: symlink traversal, OSV package checks, hidden directory traversal
MCP OAuth token refresh and cold-load cache expiry

4c. run_agent (1.45× coverage)

55 files test agent loop orchestration and message lifecycle.

Example: tests/test_hermes_state.py:24-47 (session CRUD):

def test_end_session_preserves_original_end_reason(self, db):
    """First end_reason wins — compression split must not be overwritten."""
    db.create_session(session_id="s1", source="cli")
    db.end_session("s1", end_reason="compression")
    first_ended_at = db.get_session("s1")["ended_at"]
    
    # Stale CLI holds old session_id and calls end_session() again
    time.sleep(0.01)
    db.end_session("s1", end_reason="resumed_other")
    
    session = db.get_session("s1")
    assert session["end_reason"] == "compression"  # First win, not overwritten
    assert session["ended_at"] == first_ended_at

Invariant defended: Session end reason is idempotent (prevents re-compression of already-compressed sessions).

Key test areas:

SessionDB SQLite CRUD, FTS5 search, export
Message append, tool_call_count increments
Token count accumulation
Context compression lifecycle

5. What They Test LIGHTLY or NOT AT ALL

Undertested Areas:

Skills auto-creation path: tests/skills/ has only 6 tests (telephony, memento cards, YouTube quiz, Google OAuth). No test for the core create-from-experience path that the framework markets as "self-improving". Skill fuzzy patching is tested, but not the full discovery→execution→patch→learn loop.

Gateway multi-platform routing: 173 gateway tests but no parametrized routing tests across >5 platforms. No test verifies message bounces correctly between Telegram→Discord→Matrix with same user. No test for platform fallback if one adapter disconnects.

RL training pipeline (tinker-atropos): Only tests/tools/test_rl_training_tool.py (120 LOC) tests file handle cleanup and process termination. No test for actual RL training, reward signal, policy gradient, or convergence. The tinker-atropos/ directory has 0 test files in the main codebase (tinker-atropos is vendored, not tested end-to-end).

Evidence:

/tests/skills/ — 6 test files, 2.1K LOC vs 7.2K source LOC (0.29× ratio)
/tests/integration/ — no RL training, only batch runner + daytona/modal + voice
/tinker-atropos/ — 0 test files (directory exists for inference only)
/tests/e2e/ — 1 file, command dispatch only, no skill→agent loop

CLI parsing: Only basic smoke tests. No parametrized testing of 400+ CLI flags. No property-based fuzz of config YAML malformations.

Security testing gaps:

No prompt injection tests except test_cron_prompt_injection.py (cron-specific)
No test for tool hallucination (agent asks for non-existent tool)
No fuzzing of LLM output parsing (malformed JSON responses)
Only test_sql_injection.py (column name + query parameterization)
Only test_worktree_security.py + test_symlink_prefix_confusion.py (filesystem only)

6. Test Quality Signals

Integration Tests

tests/integration/ (8 files): batch runner checkpointing, daytona/modal terminal backends, voice channels, web tool interactions
No end-to-end agent task (e.g., "write code, test, refine" loop)
No cross-platform skill sharing test (skill created in Telegram, used in Discord)

Property-Based Testing

Not used. No hypothesis. No QuickCheck-style generators.
All tests use concrete fixtures and exhaustive case enumeration

Adversarial/Security Testing

Limited. Only regression tests, not proactive attack surface:
- test_cron_prompt_injection.py — regex fuzzing for bypass patterns
- test_sql_injection.py — assertion-only (no execution attack, only static checks)
- test_tirith_security.py — Tirith-framework vulnerability scanning (external)
- test_worktree_security.py — no Git command injection test, only symlink escape

Time-Sensitive Tests

test_timezone.py — 15.7K LOC of deterministic timezone/locale edge cases
test_hermes_logging.py — handler lifecycle with mock clock (no freezegun)
test_hermes_state.py — uses time.sleep(0.01) for idempotency checks
No use of freezegun or pytest-freezegun

Flaky Test Patches

34 @pytest.mark.skipif markers (platform/privilege checks)
0 @pytest.mark.flaky (no retries configured)
CI timeout 20min (suggests some slow tests, no flake reporting)

Flake evidence:

tests/honcho_plugin/test_client.py — 3 skipif markers (asyncio event loop issues)
tests/tools/test_ssh_environment.py — entire file conditionally skipped if SSH key unavailable
No recent history of "flaky test: reverted" commits in logs

7. The Philosophy of Their Test Suite

What the test suite reveals about trust in stochastic behavior:

The hermes-agent testing philosophy trades breadth of LLM output validation for depth of system invariants.

They Test Heavily:

Deterministic system paths — session lifecycle, message queueing, skill mutation, API routing
Mocked LLM calls — no real inference, only mock adapter responses
Configuration binding — 10K+ LOC of config parsing, default handling, precedence
Platform adapters — mock Telegram/Discord at the Python module level, assert send/receive correctness

They Don't Test (Implicitly Trust LLM):

Tool use correctness — no assertion that Claude actually uses the web_search tool when asked; only that the tool executes
Prompt quality — no tests that check agent reasoning, only that it doesn't crash
Context window management — compression is tested for correctness, but no test validates that it preserves semantic meaning

Core Assumption:

"If the system harness works correctly and the LLM is called with the right tools/prompt, the LLM will do the right thing."

This is pragmatic for an agentic framework — the framework can't predict LLM output, so it tests:

Does the harness handle LLM stochasticity gracefully? (retry logic, fallback, error classification)
Does the harness correctly isolate bad outputs? (tool sandbox, skill safety checks, prompt injection blocking)
Do the system components compose correctly? (adapter → session → agent loop → storage)

Example of this philosophy in action:

test_hermes_state.py tests that session IDs survive compression — NOT that compression preserves conversation semantics (which only the LLM can judge)
tests/gateway/test_api_server.py tests that /v1/chat/completions returns 200 with correct schema — NOT that the response is sensible
test_skill_improvements.py tests that patches apply successfully — NOT that the patched skill works better than the original (depends on LLM feedback)

The suite is prescriptive (system must behave this way) but not proscriptive (we don't guard against bad LLM choices). This is the right engineering trade-off for an LLM-based system where the LLM is the source of truth.

Raw

hermes-agent.md

hermes-agent Learning Index

Source

Origin: ./origin/ — symlink to ghq
GitHub: https://github.com/NousResearch/hermes-agent
Submodule: tinker-atropos — RL training integration

Explorations

2026-04-20 1946 (default · 3 agents)

Architecture — structure, entry points, core abstractions, design decisions
Code Snippets — 10 illustrative snippets (agent loop, tools, prompts, providers)
Quick Reference — install, quickstart, config, gotchas

Key insights:

Hermes is a self-improving agent with closed learning loops — Atropos RL feeds back into the agent via tinker-atropos submodule.
Provider-agnostic core: single abstraction over Anthropic / Bedrock / Gemini / OpenAI-compatible (~7+ providers).
Security-first prompt assembly scans context for injection patterns before handing off to the LLM.
Extensibility via Skills-as-Markdown (procedural memory), pluggable Tools registry, and MCP support.
SQLite + FTS5 state store backs memory and context compression.

2026-04-20 2218 (deep · 3 agents — philosophy lens)

Issues & Commits — release rhythm, commit style, argued themes, stated vs revealed philosophy
Testing — tests/ structure, coverage gaps, what they trust LLM to do vs what they assert
API Surface — CLI / MCP / ACP / gateway / plugin / tool / skill / webhook / cron / Python-embed

Key insights from the deep run:

Release rhythm: every ~4 days. Changelogs cite PRs, not marketing copy.
Commit discipline: strict conventional commits (type(scope): what), atomic, no narrative — code is the spec; discussions live in PRs.
What they argue about: provider compatibility (8+ bugs), gateway reliability (6), skill autonomy (5), config/onboarding (4), session search (3).
Tests are prescriptive, not proscriptive — they assert the harness works, they trust the LLM to use it correctly. Skills auto-creation loop and cross-platform routing have thin coverage; tinker-atropos has zero test files.
Python embedding is unsupported — hermes is a daemon/CLI, not a library. Use ACP or subprocess.
The one claim only hermes makes: "The only agent with a closed learning loop that works across all platforms and all models simultaneously, without lock-in."

nazt/2218_API-SURFACE.md

Hermes Agent — API Surface Document

1. CLI Public Surface

Major Command Groups

Chat & Sessions

Model & Auth

Gateway & Messaging Platforms

Skills & Toolsets

Plugins

Memory & Context

Scheduled Tasks

Webhooks

MCP Integration

System Commands

2. MCP Server Mode

Current State: MCP Client Only

3. MCP Client Mode

Authentication Methods

Tool Namespacing

4. ACP (Agent Client Protocol) Server

Key Components

ACP Messages Handled

Streaming Protocol

5. Gateway: Messaging Platform Adapters

Supported Platforms (15+)

Required Methods (All Adapters)

Optional Methods

Adding a New Platform

6. Plugin System

Plugin Manifest (plugin.yaml)

Plugin Entry Point (__init__.py)

Valid Hooks

Plugin Loading

7. Tool & Toolset Interface

Standard Tool Pattern

Tool Schema Generation

Built-in Toolsets

8. Skill Interface

SKILL.md Frontmatter

Skill Loading Modes

Skill Categories (Standard Convention)

Skill Interaction

9. Webhook Subscriptions & Cron Scheduling

Webhook Subscriptions

Cron Scheduling

10. Python API for Embedding

Partial Python Access (Internal Use Only)

Recommended Path (Official)

Summary of Integration Points

Hermes Agent: Philosophy from Issues, Commits, and Design Artifacts

1. Release Rhythm: Every 1–5 Days, Hyperactive Iteration

2. Commit Message Discipline: Conventional Commits + Narrative Why

3. Issues: What the Team Argues About

4. Pull Requests: Merge Pattern & Reviewer Philosophy

5. Philosophy Consistency Check: Stated vs. Revealed

README's Claims

README's Claims

CONTRIBUTING.md's Priority Ladder

SECURITY.md's Trust Model

AGENTS.md's Developer Ethic

6. The One Thing Only Hermes Would Say

7. What the Issues Reveal About Team Pressure & Resistance

Summary: The Observable Philosophy

Hermes Agent Testing Patterns

1. Test Structure

2. Test Framework & Runner

3. Mocking & Fixtures Patterns

4. What They Test HARD (3 Most-Tested Modules)

4a. gateway/api_server (1.24× coverage ratio)

4b. tools/ (1.10× coverage)

4c. run_agent (1.45× coverage)

5. What They Test LIGHTLY or NOT AT ALL

Undertested Areas:

6. Test Quality Signals

Integration Tests

Property-Based Testing

Adversarial/Security Testing

Time-Sensitive Tests

Flaky Test Patches

7. The Philosophy of Their Test Suite

Plugin Manifest (`plugin.yaml`)

Plugin Entry Point (`init.py`)