Skip to content

Instantly share code, notes, and snippets.

@nazt
Created April 20, 2026 15:28
Show Gist options
  • Select an option

  • Save nazt/1ecdbc8bbe558820da8105a123403e1d to your computer and use it in GitHub Desktop.

Select an option

Save nazt/1ecdbc8bbe558820da8105a123403e1d to your computer and use it in GitHub Desktop.
hermes-agent deep /learn — philosophy via issues+commits, testing patterns, API surface (2026-04-20)

Hermes Agent — API Surface Document

Project: NousResearch/hermes-agent
Date: 2026-04-20
Scope: Integration surfaces for external systems communicating with Hermes

This document catalogs the points where external systems can integrate with Hermes Agent: CLI commands, MCP server/client modes, ACP (Agent Client Protocol), messaging platforms, plugins, tools, skills, webhooks, cron, and Python embedding.


1. CLI Public Surface

Entry point: hermes command (alias: python cli.py or direct module invocation)
Parser setup: hermes_cli/main.py lines 6335–7820 (argparse subparsers)
Command execution callbacks: hermes_cli/main.py lines 1021–6272 (cmd_* functions)

Major Command Groups

Chat & Sessions

  • hermes → Interactive REPL (default; no subcommand)
  • hermes chat → Direct chat query (-q, --prompt)
  • hermes sessions list → List recent sessions
  • hermes sessions export → Export session history (JSON/markdown)
  • hermes sessions delete <session_id> → Remove a session
  • hermes sessions prune → Delete old sessions (--source, --days)

Model & Auth

  • hermes model [provider:model] → Switch active model
  • hermes login <provider> → Authenticate to external services (--scope)
  • hermes logout <provider> → Remove stored credentials
  • hermes auth add → Pool credentials (--label, --portal-url, --inference-url)
  • hermes auth list → List pooled credentials
  • hermes auth remove <provider> → Remove pooled credential

Gateway & Messaging Platforms

  • hermes gateway run → Start messaging gateway (Telegram, Discord, Slack, etc.)
  • hermes gateway start → Start as background service
  • hermes gateway stop → Stop background service
  • hermes gateway status → Check platform health (--deep)
  • hermes gateway setup → Interactive platform configuration wizard
  • hermes setup → Full system setup wizard

Skills & Toolsets

  • hermes tools → Configure enabled tools
  • hermes skills browse → Browse Skills Hub (categories, filters)
  • hermes skills search <query> → Search for skills (--limit)
  • hermes skills install <identifier> → Install a skill from registry
  • hermes skills list → Show installed skills
  • hermes skills tap → Manage skill sources (GitHub repos)

Plugins

  • hermes plugins install <url|name> → Add a plugin
  • hermes plugins list → Show installed plugins
  • hermes plugins enable <name> → Activate a plugin
  • hermes plugins disable <name> → Deactivate a plugin
  • hermes plugins remove <name> → Uninstall a plugin

Memory & Context

  • hermes memory → View and manage persistent memory (--export, --import)

Scheduled Tasks

  • hermes cron list → List all scheduled jobs (--all for disabled)
  • hermes cron create → Add a new scheduled task (--name, --deliver)
  • hermes cron edit <job_id> → Modify schedule or prompt
  • hermes cron pause <job_id> → Suspend a job
  • hermes cron resume <job_id> → Resume a paused job
  • hermes cron run <job_id> → Trigger immediately
  • hermes cron remove <job_id> → Delete a job

Webhooks

  • hermes webhook subscribe <name> → Register webhook route (--channel, --description)
  • hermes webhook list → Show active subscriptions
  • hermes webhook remove <name> → Unsubscribe
  • hermes webhook test <name> → Send test payload

MCP Integration

  • hermes mcp add <name> → Register MCP server (--command, --url, --auth)
  • hermes mcp list → Show configured servers
  • hermes mcp test <name> → Verify connectivity
  • hermes mcp remove <name> → Deregister server

System Commands

  • hermes config set <key> [value] → Set config option
  • hermes config show → Display current config
  • hermes config migrate → Upgrade config schema
  • hermes status → Show agent health
  • hermes doctor → Diagnose issues (dependencies, config, auth)
  • hermes version → Display version info
  • hermes update → Upgrade to latest release
  • hermes backup → Create encrypted backup archive
  • hermes import <zipfile> → Restore from backup

2. MCP Server Mode

Hermes as MCP server — exports tools to external MCP clients.
Entry: hermes mcp serve (planned; currently tools exposed via ACP only)
Module: tools/mcp_tool.py (currently MCP client only; server mode being added)

Current State: MCP Client Only

Hermes consumes external MCP servers via the config:

mcp_servers:
  filesystem:
    command: "npx"
    args: ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
    timeout: 120
  github:
    url: "https://mcp-server.example.com/mcp"
    headers:
      Authorization: "Bearer sk-..."

Client code: tools/mcp_tool.py lines 1–300 (connection + discovery)

  • Stdio transport: command + args (spawn subprocess)
  • HTTP/StreamableHTTP: url + optional headers and auth
  • Tool discovery: MCP ListTools → agent registry injection with namespace prefix
  • Authentication: oauth and header modes for paid MCP endpoints
  • Sampling: MCP servers can request LLM completions back to Hermes (configurable limits)

MCP Configuration: hermes_cli/mcp_config.py lines 1–400 (parsing, validation, auth setup)


3. MCP Client Mode

Config file: ~/.hermes/config.yamlmcp_servers section
Client implementation: tools/mcp_tool.py lines 100–600 (async client loop)

Authentication Methods

  1. OAuth (--auth oauth)

    • Redirects to provider OAuth endpoint
    • Stores refresh token in ~/.hermes/secrets/mcp-<name>.json
    • Automatic token refresh before expiration
  2. Header-based (--auth header)

    • Static bearer token or custom header in request
    • Stored in config or env var (e.g., MCP_<NAME>_TOKEN)
  3. Environment variables (default)

    • Resolved from shell env; no persistent storage
    • Suitable for ephemeral containers

Tool Namespacing

MCP tools are prefixed with their server name:

  • Server github with tool search_issuesgithub:search_issues (internal)
  • Display name: "Search Issues (github)" in UI

Namespace collision resolution: Last-registered server wins (reload order: bundled → user → config)


4. ACP (Agent Client Protocol) Server

Purpose: Editor integration (VS Code, Zed, JetBrains, Cursor, Windsurf)
Server code: acp_adapter/server.py (full ACP 0.9.0 spec)
Lifecycle entry: acp_adapter/entry.py (server startup)

Key Components

Component File Purpose
HermesACPAgent acp_adapter/server.py:95 Main ACP agent class (subclass of acp.Agent)
SessionManager acp_adapter/session.py Manages editor session state, tool calls, streaming
MessageHandler acp_adapter/events.py Converts agent events → ACP protocol messages
Permissions acp_adapter/permissions.py Command approval callback (approval workflow)

ACP Messages Handled

  • initialize() → Returns agent capabilities (models, tools, MCP servers)
  • new_session() → Create new chat session
  • load_session(session_id) → Restore saved conversation
  • fork_session() → Branch conversation
  • send(session_id, prompt, mode) → Process user query (mode = chat, task, architect)
  • set_session_model() → Switch LLM mid-session
  • set_session_config_option() → Update session settings
  • cancel_task() → Interrupt current work
  • list_sessions() → Browse saved chats
  • /slash commands → Exposed via _SLASH_COMMANDS dict (/help, /model, /memory, /tools)

Streaming Protocol

Agent responses stream as protocol events:

  • message_chunk — LLM text tokens (progressive rendering)
  • tool_call_start / tool_call_complete — Tool execution events
  • step_complete — Full turn finished
  • usage — Token count + cost estimate

5. Gateway: Messaging Platform Adapters

Base class: gateway/platforms/base.py line 887 (BasePlatformAdapter)
Directory: gateway/platforms/ (27 adapters)
Factory: gateway/run.py (_create_adapter())
Config: ~/.hermes/config.yamlplatforms section

Supported Platforms (15+)

Platform File Status
Telegram telegram.py Full support (groups, channels, PM, media)
Discord discord.py Full (threads, slash commands, reactions)
Slack slack.py Full (threads, message updates, blocks)
WhatsApp whatsapp.py Full (via WhatsApp Business API)
Signal signal.py Full (via signald daemon)
Weixin (WeChat) weixin.py Full (groups, official accounts, mini programs)
WeChat Enterprise wecom.py Full (message callbacks, approvals)
Feishu (Lark) feishu.py Full (doc creation, file management)
DingTalk dingtalk.py Full (robot messages, card interaction)
Mattermost mattermost.py Full (self-hosted Slack alternative)
Matrix matrix.py Full (Synapse homeserver)
QQ Bot qqbot/ Full (QQ groups and DMs)
Email email.py Limited (inbound IMAP)
SMS sms.py Limited (Twilio provider)
Home Assistant homeassistant.py Limited (notification delivery only)
Webhook webhook.py Generic HTTP POST subscriptions
BlueBubbles bluebubbles.py iMessage relay from macOS

Required Methods (All Adapters)

class BasePlatformAdapter(ABC):
    def __init__(self, config: PlatformConfig, platform: Platform):
        """Parse config, initialize state."""
    
    async def connect(self) -> bool:
        """Establish connection. Return True on success."""
    
    async def disconnect(self) -> None:
        """Stop listeners, close connections."""
    
    async def send(self, chat_id: str, text: str, **opts) -> SendResult:
        """Send text message. Return success/failure + message_id."""
    
    async def send_typing(self, chat_id: str) -> None:
        """Send typing indicator (ephemeral)."""
    
    async def send_image(self, chat_id: str, url: str, caption: str) -> SendResult:
        """Send image from URL."""
    
    async def get_chat_info(self, chat_id: str) -> dict:
        """Return {name, type, chat_id, members...}."""
    
    async def handle_message(self, event: MessageEvent) -> None:
        """Process inbound message (called by adapter internally)."""

Optional Methods

  • send_document(chat_id, file_path, caption) — File attachment
  • send_voice(chat_id, file_path) — Audio message
  • send_video(chat_id, file_path, caption) — Video
  • send_animation(chat_id, file_path, caption) — GIF/animation
  • send_image_file(chat_id, file_path, caption) — Local image

Adding a New Platform

See gateway/platforms/ADDING_A_PLATFORM.md (8,826 bytes). Key steps:

  1. Create adaptergateway/platforms/<platform>.py, subclass BasePlatformAdapter
  2. Add enumgateway/config.py, extend Platform enum
  3. Register in factorygateway/run.py, add case in _create_adapter()
  4. Add auth mapgateway/run.py, if using custom auth (OAuth, tokens)
  5. Add CLI setuphermes_cli/main.py, subcommand for platform-specific config
  6. Implement message routing → Use self.build_source() for session keys
  7. Handle media → Use cache_image_from_bytes(), cache_audio_from_bytes() for attachments
  8. Logging → Redact secrets in all log output

6. Plugin System

Plugin storage: ~/.hermes/plugins/<name>/ (user), <repo>/plugins/<name>/ (bundled)
Manifest: Each plugin requires plugin.yaml + __init__.py
Discovery: hermes_cli/plugins.py lines 1–300
Hook execution: hermes_cli/plugins.py lines 400–600 (invoke_hook())

Plugin Manifest (plugin.yaml)

name: my-plugin
version: 1.0.0
description: What this plugin does
author: Your Name
requires_env:
  - SOME_API_KEY
  - secret_token: "OPTIONAL_SECRET"
provides_tools:
  - custom_tool_name
provides_hooks:
  - pre_llm_call
  - post_tool_call

Plugin Entry Point (__init__.py)

def register(ctx: PluginContext):
    """Called once during plugin load."""
    ctx.register_tool(my_tool)
    ctx.on("pre_llm_call", my_hook_handler)

Valid Hooks

Hook Fired When Signature
pre_tool_call Before tool execution (tool_name, args, **kwargs)
post_tool_call After tool returns (tool_name, result, **kwargs)
transform_tool_result Transform tool output (result: str) -> str
pre_llm_call Before model inference (messages, model, **kwargs)
post_llm_call After model response (response, **kwargs)
transform_terminal_output Reformat terminal output (output: str) -> str
pre_api_request Before HTTP request (method, url, **kwargs)
post_api_request After HTTP response (response, **kwargs)
on_session_start Session begins (session_id, **kwargs)
on_session_end Session closes (session_id, **kwargs)
on_session_finalize Before persistence (session_id, messages, **kwargs)
on_session_reset Clear session (session_id, **kwargs)

Plugin Loading

  1. Bundled plugins (<repo>/plugins/*/) + excluded subdirs (memory/, context_engine/)
  2. User plugins (~/.hermes/plugins/*/)
  3. Project plugins (./.hermes/plugins/*/, opt-in via HERMES_ENABLE_PROJECT_PLUGINS)
  4. Pip entry-point plugins (exposed via hermes_agent.plugins entry group)

Later sources override earlier ones (name collisions).


7. Tool & Toolset Interface

Tool registry: tools/ directory (40+ built-in tools)
Tool schema: Pydantic models + docstring introspection
Discovery: hermes_cli/tools_config.py (registry enumeration)

Standard Tool Pattern

# In tools/custom_tool.py
from typing import Annotated
from pydantic import BaseModel, Field

class CustomToolInput(BaseModel):
    query: str = Field(..., description="Search query")
    limit: int = Field(10, description="Max results")

def custom_tool(input: Annotated[CustomToolInput, "Tool name"]) -> str:
    """Tool description.
    
    Long description for system prompt (markdown).
    """
    # implementation
    return result

Tool Schema Generation

  1. Docstring parsing → Extract description
  2. Type hints → Build JSON Schema from Pydantic models
  3. Field descriptions → Pulled from Field(..., description=...)
  4. Registry injection → Tool available as both:
    • Built-in (direct function call)
    • MCP-prefixed (if exposed via server mode)
    • Plugin-registered (dynamic at runtime)

Built-in Toolsets

Toolset Tools Enabled By Default
web web_search, web_tools, browser Yes
terminal bash, python, code_execution Conditional (sandbox)
files file_operations, file_tools Yes
vision image_analysis, screenshot Conditional (vision models)
audio tts, speech_recognition Conditional (audio hardware)
image_gen image_generation Conditional (API keys)
code git_tools, github_tools Conditional (auth)

8. Skill Interface

Skill storage: ~/.hermes/skills/<category>/<skill-name>/ (user), <repo>/skills/ (bundled)
Manifest: SKILL.md (frontmatter + markdown)
Discovery: hermes_cli/skills_hub.py (hub + local detection)

SKILL.md Frontmatter

---
name: skill-identifier
description: One-line summary
version: 1.0.0
author: Author Name
license: MIT
metadata:
  hermes:
    tags: [python, deployment, ci-cd]
    related_skills: [other-skill-id, ...]
    requires: [python-3.11, docker]
    cost: "high"  # or "medium", "low", "free"
---

Skill Loading Modes

  1. Pre-armed — Bundled + user skills loaded at startup (in system prompt)
  2. Lazy-loaded — Hub skills loaded on-demand (/skill-name command)
  3. Indexed — FTS search across all available skills (local + Hub)

Skill Categories (Standard Convention)

skills/
├── github/
│   ├── github-auth/
│   ├── github-pr-workflow/
│   ├── github-code-review/
│   └── github-issues/
├── productivity/
│   ├── calendar-integration/
│   ├── task-automation/
│   └── email-management/
├── research/
│   ├── arxiv-search/
│   └── literature-summary/
└── ...

Skill Interaction

Users invoke skills via:

  • /skill-name — Load and execute skill
  • /skills — Browse available skills
  • /skills search <query> — FTS search
  • Auto-loading when agent detects task matches skill domain

9. Webhook Subscriptions & Cron Scheduling

Webhook Subscriptions

Storage: ~/.hermes/webhook_subscriptions.json
Config: hermes_cli/webhook.py lines 1–200
Platform adapter: gateway/platforms/webhook.py (HTTP listener)

hermes webhook subscribe my-github-events \
  --channel telegram:123456 \
  --description "GitHub push notifications"

Subscription object:

{
  "name": "my-github-events",
  "channel": "telegram:123456",
  "route": "/webhooks/my-github-events",
  "secret": "hmac-secret-auto-generated",
  "description": "GitHub push notifications",
  "created_at": "2026-04-20T...",
  "last_received": "2026-04-20T..."
}

Delivery: Hot-reloaded without gateway restart; webhook platform listens on host:port.

Cron Scheduling

Storage: ~/.hermes/cron/jobs.json
Scheduler daemon: cron/scheduler.py (background process)
CLI interface: hermes_cli/cron.py + cron/jobs.py

hermes cron create \
  --schedule "0 9 * * *" \
  --prompt "Generate daily standup report" \
  --deliver slack:#reports

Job object:

{
  "id": "job-uuid",
  "name": "daily-standup",
  "schedule": {"type": "cron", "value": "0 9 * * *"},
  "prompt": "Generate daily standup report",
  "deliver": ["slack:#reports", "local"],
  "enabled": true,
  "skills": ["productivity/task-summary"],
  "script": null,
  "next_run_at": "2026-04-21T09:00:00Z",
  "last_run_at": "2026-04-20T09:00:00Z",
  "last_status": "success",
  "repeat": {"times": null, "completed": 0}
}

Delivery modes:

  • local — Message to agent home chat (CLI or primary messaging platform)
  • <platform>:<channel> — Route to specific platform/group
  • email:<recipient> — Email delivery
  • webhook:<url> — POST result to external webhook

10. Python API for Embedding

Status: Hermes is primarily a CLI tool; Python embedding is unsupported in the public API.

Partial Python Access (Internal Use Only)

The codebase contains internal Python modules that could be imported:

# Not officially supported; API may change between versions
from acp_adapter.session import SessionManager
from agent.anthropic_adapter import AnthropicAdapter  # or other model adapters
from gateway.platforms.base import BasePlatformAdapter
from hermes_cli.config import load_config

Recommended Path (Official)

For embedding Hermes in Python applications:

  1. Use ACP client library (if available)

    • Connect to running Hermes ACP server
    • Send prompts via protocol
    • Receive streamed responses
  2. Subprocess mode

    import subprocess
    import json
    
    result = subprocess.run(
        ["hermes", "chat", "-q", "your prompt"],
        capture_output=True, text=True
    )
    # parse stdout
  3. Webhook callback

    • Have cron or gateway POST results to your API
    • Query Hermes via HTTP (webhook subscriber)

Summary of Integration Points

Integration Type Entry Point Config
CLI Commands hermes <cmd> hermes_cli/main.py
MCP Client Protocol mcp_servers in config.yaml hermes_cli/mcp_config.py
ACP Server Protocol Editor → localhost:5000 acp_adapter/
Gateway Messaging hermes gateway run ~/.hermes/config.yamlplatforms
Plugins System ~/.hermes/plugins/<name>/ plugin.yaml + __init__.py
Tools Functions tools/ directory tools/ + toolset config
Skills Docs ~/.hermes/skills/ SKILL.md frontmatter
Webhooks HTTP /webhooks/<name> webhook_subscriptions.json
Cron Scheduling hermes cron ... ~/.hermes/cron/jobs.json

Document compiled: 2026-04-20 (Saturday)
Scope: Medium thoroughness — covers all major integration surfaces with file:line citations for implementation details.

Hermes Agent: Philosophy from Issues, Commits, and Design Artifacts

Date: April 20, 2026 | Scope: Commit history (last 100), Issues (open + closed, recent 30), Releases (v0.6–v0.10), Documentation (README, CONTRIBUTING, SECURITY, AGENTS.md)


1. Release Rhythm: Every 1–5 Days, Hyperactive Iteration

Cadence: v0.2.0 (Mar 12) → v0.10.0 (Apr 16) = 8 releases in 35 days = ~4-day average interval.

What warrants a release:

  • Single major feature gets shipped (Profiles in v0.6, Memory providers in v0.7, Tool Gateway in v0.10)
  • 50–180+ commits bundled; 16–60+ issues resolved per release
  • Each release has explicit highlights + 🏗️/📱/🔧 subsections organized by component

Changelog voice: Declarative, technical, evidence-linked. Every highlight cites PR numbers. Example from v0.10:

"Nous Tool Gateway — Paid Nous Portal subscribers now get automatic access to web search (Firecrawl), image generation (FAL / FLUX 2 Pro), text-to-speech (OpenAI TTS), and browser automation (Browser Use) through their existing subscription. No separate API keys needed."

No filler. No "improvements and bug fixes." Ship notes read like a technical spec, not marketing copy. This signals: ship fast, justify by impact, keep a running record.


2. Commit Message Discipline: Conventional Commits + Narrative Why

Pattern: Consistently type(scope): what across 100-commit sample.

Examples:

fix(tui): fix Linux Ctrl+C regression, remove double clipboard write
fix(agent): repair malformed tool_call arguments before API send
feat(plugins): make all plugins opt-in by default
chore(release): add jplew to AUTHOR_MAP

Observations:

  • Scope-first design: fix(tui), fix(gateway), feat(skills), chore(release) — tells you impact zone immediately
  • Atomic changes: Most commits are single-fix or single-feature; few merges visible in recent history (dcd763c Merge pull request #10125)
  • No narrative why-statements in commit bodies (visible log is subject-only) — suggests history is self-documenting via: (a) type prefix signals risk (fix < feat < chore), (b) scope signals blast radius, (c) PR numbers link to discussion
  • Author mapping: chore(release): add [name] to AUTHOR_MAP appears regularly — deliberate credit ritual, tribe-building signal

Philosophy revealed: Speed + clarity. Conventional Commits enforce scannability. Atomic commits enable safe rollback and bisect. No narrative prose in message bodies = trust the code to speak for itself; discussions live in PRs, not commit messages.


3. Issues: What the Team Argues About

Sample: 30 recent (open + closed) issues, April 20, 2026.

Themes & frequency:

Theme Count Examples Signal
Provider/model support 8 #13061 (normalize_model_name breaks custom providers), #13042 (Ollama glm-5.1 malformed JSON), #13031 (Feishu gateway tool execution), #12835 (kimi-k2.5 temperature mode), #12790 (max_tokens fallback incomplete) Polyglot-first: Hermes prioritizes multi-provider compatibility above all. Every model, every endpoint variant, must work. Bugs here are critical.
Gateway reliability 6 #13081 (socket directory glob), #13050 (Discord username surfacing), #13033 (Linux terminal freeze on paste), #13027 (skill_view HERMES_SESSION_PLATFORM check), #12868 (plugin loading post-restart) Always-on assumption: Messaging gateways are production infrastructure, not toys. Freezes, race conditions, and session loss are P0.
Skills & autonomy 5 #13075 (memory/skill nudge counter), #13060 (smalltalk async subagents), #13041 (delegate_task idling), #13028 (_save_platform_tools stale state) Learning loop is core: Autonomous skill creation and subagent delegation are not nice-to-haves; failures here block the main thesis.
Config & migration 4 #13024 (setup wizard misclassifies providers), #13025 (OpenClaw migration stale checks), #12881 (UnicodeDecodeError on update) Low-entropy user onboarding: Team invests heavily in setup, migration, config clarity — new users should not hit Python tracebacks.
Session/memory search 3 #13056 (session search time-bounded queries), #13079 (read_file dedup cache pollution) FTS5 is central: Full-text search and session recall are differentiators; bugs here are visibility breaches.
Security & approval 2 (Mostly closed; no current open issues) Mature stance: Security is baked in; few incoming reports suggest approval system + secret redaction are working.
Docs & help 0 Docs debt managed silently: No open doc issues; docs are fixed in PRs alongside features.

What gets rejected/closed-wontfix: None visible in sample. Instead, issues get rapid triage and hotfix PRs. Example: #13059 (Chinese: custom provider model ID corruption) opened Apr 20 14:19, closed 14:25 — 6-minute fix cycle. Triaging for speed, not gatekeeping.


4. Pull Requests: Merge Pattern & Reviewer Philosophy

Sample: 20 recent PRs (all open, i.e., pending review/merge).

Patterns:

Pattern Count Examples Signal
Bug fix + test 10 #13080 (file-tools TERMINAL_CWD), #13078 (null/scalar in get_disabled_skills), #13077 (stream consumer first message), #13073 (transport types + Anthropic normalize) Tests-as-spec: Fixes include test additions. Reviewers validate via test changes, not code inspection alone.
Feature with integration 6 #13082 (signet crypto audit trail), #13070 (Docker env var overrides), #13066 (Feishu media delivery), #13063 (Discord history backfill) Completeness bar: Features ship with full integration: config, docs, tests, example commands. Partial implementations rejected.
Cross-platform compat 3 #13064 (right-click paste), #13074 (QQCloseError backoff), #13073 (transport types) "Works on my machine" is not acceptable: Platform divergence (macOS/Linux/Windows, TUI/CLI/gateway, all providers) is actively hunted and fixed.
Reasoning/thinking model support 2 #13076 (api_server load reasoning config), #13071 (Copilot ACP async + streaming) Extended thinking is first-class: o1-like models are treated as language-level feature, not afterthought.
Closed (rejected/duplicate) 1 #13069 (duplicate of #13070) Deduplication happens before full review. Prevents churn; suggests strong issue triage discipline.

No visible rejection comments. Open PRs are either in the merge queue or waiting for CI/feedback. No evidence of "we won't take this" — instead, community PRs get integrated or redirected to Skills Hub.


5. Philosophy Consistency Check: Stated vs. Revealed

README's Claims

"The only agent with a built-in learning loop — it creates skills from experience, improves them during use, nudges itself to persist knowledge, searches its own past conversations, and builds a deepening model of who you are across sessions."

Evidence in artifacts:

  • Commit: feat(plugins): convert disk-guardian skill into a bundled plugin (068b2248) — skills auto-generated and bundled
  • Issue #13075: "Memory/Skill Nudge Counter Issues" — nudging is a tracked, prioritized feature
  • Release v0.7: "Pluggable Memory Provider Interface" + Honcho integration — memory is extensible
  • Session search: #13056 (time-bounded queries), FTS5 in hermes_state.py — past conversation search is core
  • Claim validated.

README's Claims

"Run it on a $5 VPS, a GPU cluster, or serverless infrastructure... It's not tied to your laptop."

Evidence in artifacts:

  • Release v0.6: "Profiles — Multi-Instance Hermes", "Docker Container", "Modal and Docker container skills/credentials mounting"
  • Release v0.7: "API Server Session Continuity" for Open WebUI integration
  • Release v0.6: "Feishu/Lark + WeCom Platform Support" (multiple messaging platforms)
  • Commit: feat(whatsapp): implement send_voice (ed76185c) — multi-platform audio support
  • Claim validated.

CONTRIBUTING.md's Priority Ladder

  1. Bug fixes (crashes, incorrect behavior, data loss)
  2. Cross-platform compatibility
  3. Security hardening
  4. Performance & robustness
  5. New skills
  6. New tools
  7. Documentation

Evidence in artifacts:

  • Open issue #13081: "glob pattern doesn't match socket directories, causing daemon startup failure" — crash prioritized
  • Open issue #13033: "setup freezes terminal on Linux" — cross-platform compat bug is open
  • Closed issue #12881: "UnicodeDecodeError on config migration" — fixed within 24h, crash behavior
  • Release v0.7: "Gateway Hardening" section (5 PRs for race conditions, flood control, compression death spirals)
  • Release v0.7: "Security: Secret Exfiltration Blocking" (secret patterns, credential directory protections)
  • Commits show: mostly fix(*) and occasional feat(*), rarely docs(*)
  • Hierarchy is operational, not aspirational.

SECURITY.md's Trust Model

"Single Tenant: The system protects the operator from LLM actions, not from malicious co-tenants."

Evidence in artifacts:

  • Approval system is configurable: approvals.mode: "on" (default), "auto", "off" — operator choice, not enforcement
  • Issue #13056: Session search lacks time-bounded queries — user owns their own search results
  • Commits: fix(agent): repair malformed tool_call arguments (9eeaaa4f) — LLM output validation is proactive
  • Trust model aligns with single-user, self-service assumption.

AGENTS.md's Developer Ethic

"AIAgent class — core conversation loop, tool dispatch, session persistence" (minimal docs, code is canonical)

Evidence in artifacts:

  • AGENTS.md is light on narrative, heavy on code paths and class signatures
  • git log shows atomic, type-prefixed commits — code structure must be self-evident
  • Release notes cite file changes + PR numbers, not architecture essays
  • Assumes developers read code-first, prose second.

6. The One Thing Only Hermes Would Say

Claim: "Hermes is the only agent with a closed learning loop that works across all platforms and all models simultaneously, without lock-in."

Breakdown:

  1. Closed learning loop: Autonomous skill creation from conversations (#13075 nudge counter), FTS5 session search (#13056), procedural memory with pluggable backends (v0.7), subagent delegation with memory isolation (SECURITY.md). No other agent framework bundles all three.

  2. Works across all models & providers simultaneously: 100+ commits in last 30 days are provider/model-specific fixes (normalize_model_name, max_tokens fallback, temperature mode detection, context length resolution). v0.7 release alone adds 6 new provider patterns (Anthropic long-context tier 429, Fireworks context detection, DashScope international, Bearer auth for MiniMax). This is obsessive-compulsive model compatibility work, not an afterthought. Other frameworks pick 2–3 providers and call it done.

  3. Works across all platforms: Telegram (webhook + polling), Discord (multi-workspace OAuth), Slack, WhatsApp, Signal, Matrix, Mattermost, Feishu/Lark, WeCom, Home Assistant, Email, CLI, TUI, API server. v0.6 added Feishu + WeCom in a single release. No multi-platform agent does this.

  4. Zero lock-in: Can switch models via hermes model [provider:model]; can swap providers via fallback chains; can migrate memory via pluggable providers (Honcho as official reference implementation); can run skills locally or on Modal/Docker; can deploy to any terminal backend. README explicitly names 10+ inference endpoints. This is not marketing; every claim is enforced by tests and releases.

Evidence:

  • Commit cadence: 100+ fixes in last 100 commits = tight feedback loop on real breakage, not wishful features.
  • Issue triage: #13059 (custom provider corruption) fixed in 6 minutes = production incident response, not hobby project.
  • Release velocity: v0.6–v0.10 in 35 days, each release with 50–180+ commits = team moving fast on observable friction.
  • Architecture: Single AIAgent class, toolsets.py for platform abstraction, tools/registry.py for dynamic tool discovery, hermes_state.py for unified session storage = no special cases, no platform-specific forks.

Why this claim is defensible:

  • Claude Desktop (LLM IDE plugin) and ChatGPT+ (web-only). Copilot (Microsoft-locked). Cursor (editor-only). All are single-platform, single-model. Hermes refuses single-platform constraints at a code-level.
  • OpenRouter, Together, Anyscale do provider aggregation, but only for inference. They don't ship a conversation UI, session search, memory system, skill creation, multi-platform gateway, terminal backend abstraction, and approval system that all work together across providers.
  • The learning loop (skills + FTS5 search + nudges) is novel to Hermes. Retrieval-augmented generation is standard; procedural skill generation from execution traces is not.

7. What the Issues Reveal About Team Pressure & Resistance

Community pressure vs. Team decisions:

Issue Status Team Response Signal
#13049 (XMPP channel) OPEN No response yet Team doesn't auto-accept every platform request. XMPP is niche; Hermes gates on "broadly useful" (CONTRIBUTING.md).
#13041 (delegate_task idle timeout) OPEN Waiting for fix Subagent autonomy is broken; team is aware and triaging. Not deprioritized.
#13060 (smalltalk async subagents) OPEN Feature request labeled Community asking for async subagent UX (side conversations). Team tracking but not committed.
#13072 (CLI auto-queue mode) OPEN Feature request Users want queueing behavior; team hasn't shipped yet. Not a priority vs. crashes.
v0.7 Pluggable Memory RELEASED Shipped Honcho integration Community pressure on memory backends → formalized provider ABC → released. Team listened.
v0.6 Ordered Fallback Providers RELEASED Shipped, closes #1734 Issue #1734 is 10+ months old; finally resolved. Team prioritizes based on data, not age.

Team stance: Hear community, ship high-ROI fixes (crash bugs, provider compat), gate speculative features (XMPP, auto-queue). Not dismissive; not obligated to every request.


Summary: The Observable Philosophy

Pillar Evidence Implication
Speed over perfection 4-day release cycle, 6-minute hotfix triage, 100+ commits per release Iterate fast, break stuff carefully (with tests), learn from production
Compatibility is correctness 8+ open provider bugs, v0.7 release has 20+ provider-specific fixes Support all models, all endpoints, all variants. Parity across providers is a feature, not a bug list.
Autonomy is the goal Learning loop (skills + memory + search), subagent delegation, FTS5 session recall are prioritized Agent should know what it learned; should improve over time; should think independently. User is collaborator, not command source.
Multi-platform is mandatory 10+ messaging platforms, multiple terminal backends, serverless + container + local execution Hermes is not a CLI tool. It's a distributed agent runtime that happens to have a CLI.
Trust is individual Single-tenant model, operator-owned approval gates, credential isolation, sandbox defaults Hermes trusts you, not the cloud. You own your keys, your sessions, your skills.
Code is the spec Light documentation, atomic commits, type-prefixed scope, PR-driven discussions Read the code. Run the tests. Ship it. Narratives are secondary to executable truth.

One final signal: The release notes quote PR numbers obsessively. Example: v0.7 has 50+ PR citations in 20 lines of highlights. This is radical transparency. Readers can click any claim and see the implementation. Other projects write prose; Hermes writes audit trails.

Hermes Agent Testing Patterns

1. Test Structure

Test suite: 673 Python files across 16 subfolders. Total: ~200K LOC of test code.

Directory layout:

  • tests/gateway/ (173 files, 66K LOC) — platform adapters (Telegram, Discord, Slack, Matrix, API server)
  • tests/tools/ (149 files, 51K LOC) — tool execution, skill manager, code execution, MCP
  • tests/hermes_cli/ (119 files, 36K LOC) — CLI behaviors, commands, config parsing
  • tests/run_agent/ (55 files, 18K LOC) — agent loop, message handling, context compression
  • tests/agent/ (44 files, 19K LOC) — LLM adapters (Anthropic, OpenAI, Bedrock, etc.)
  • tests/cli/ (44 files, 9K LOC) — older CLI entry points
  • tests/integration/ (8 files) — batch runner, checkpoint resumption, voice channels
  • tests/e2e/ (1 file, +conftest) — full gateway pipeline command dispatch
  • tests/skills/, tests/plugins/, tests/cron/ — optional integrations

Test-to-source coverage ratios (LOC test / LOC source):

  • gateway: 1.24× (most heavily tested)
  • run_agent: 1.45× (agent loop critical path)
  • tools: 1.10× (broad tool coverage)
  • agent: 0.90× (LLM adapter coverage)
  • hermes_cli: 0.70× (CLI underdeveloped)
  • skills: 0.29× (marked as lightly tested)

Undertested: skills/ directory (only 6 tests for 7K LOC), optional integrations (honcho, daytona, modal).


2. Test Framework & Runner

Framework: pytest 9.0+, pytest-asyncio, pytest-xdist (parallel -n auto)

Configuration (pyproject.toml:131-136):

[tool.pytest.ini_options]
testpaths = ["tests"]
markers = ["integration: marks tests requiring external services"]
addopts = "-m 'not integration' -n auto"

CI/CD (.github/workflows/tests.yml):

  • Single job runs all unit tests (excluding integration/, e2e/) on Ubuntu 3.11
  • Separate e2e job runs tests/e2e/ only (marked @pytest.mark.asyncio)
  • Python version: 3.11 only (no 3.12 matrix)
  • Timeout: 20min for unit tests, 10min for e2e
  • Coverage config: None (no explicit .coverage, no CI report)

Dependencies:

  • pytest>=9.0.2,<10
  • pytest-asyncio>=1.3.0,<2 (async test harness)
  • pytest-xdist>=3.0,<4 (parallel execution)
  • No pytest-cov, no coverage thresholds enforced in CI

Test isolation (hermetic environment, conftest.py:192-338):

  • All credential env vars (API keys, tokens, passwords) unset per test
  • HERMES_HOME redirected to per-test tmpdir (prevents ~/.hermes leakage)
  • TZ=UTC, LANG=C.UTF-8, PYTHONHASHSEED=0 (deterministic datetime/locale)
  • AWS IMDS disabled (avoids 2s metadata service timeout)
  • Plugin singleton reset between tests
  • 30-second per-test timeout (SIGALRM on Unix, no-op on Windows)

3. Mocking & Fixtures Patterns

LLM mocking strategy:

  • Custom doubles (no responses/respx/pytest-httpx). E.g., restart_test_helpers.py:71-108 manually constructs GatewayRunner with:
    • runner._update_runtime_status = MagicMock()
    • runner.hooks.emit = AsyncMock()
    • runner.session_store = MagicMock() with ._entries = {}
  • Mock uses unittest.mock (standard library)
  • LLM calls are stubbed at the adapter level, not the HTTP layer

External service stubs:

  • Telegram: sys.modules mock (pre-test, gateway/conftest.py:21-62) provides fake ChatType constants, error classes
  • Discord: comprehensive sys.modules mock (gateway/conftest.py:65-142) covering discord.Intents, discord.app_commands group/command registration
  • Platform adapters: RestartTestAdapter (base class for Telegram mock) overrides send(), connect(), disconnect(), get_chat_info()

Fixture patterns (conftest.py):

  • _hermetic_environment (autouse) — environment isolation
  • tmp_dir(tmp_path) — per-test temp directory
  • mock_config() — minimal hermes config dict (model, toolsets, terminal backend)
  • _ensure_current_event_loop() (autouse) — creates event loop for sync tests calling asyncio.get_event_loop().run_until_complete()
  • _enforce_test_timeout() (autouse) — 30s SIGALRM

Shared fixtures per subsystem:

  • tests/gateway/conftest.py — telegram/discord sys.modules mocks
  • tests/run_agent/conftest.py — 34 lines of undocumented helpers
  • tests/e2e/conftest.py — 266 lines covering full gateway setup

Snapshot testing: NOT used. No pytest-snapshot, no golden files.


4. What They Test HARD (3 Most-Tested Modules)

4a. gateway/api_server (1.24× coverage ratio)

Heavy test focus on OpenAI-compatible API server multimodal routing.

Example: tests/gateway/test_api_server.py:55-100 (ResponseStore LRU eviction):

class TestResponseStore:
    def test_lru_eviction(self):
        store = ResponseStore(max_size=3)
        store.put("resp_1", {"output": "one"})
        store.put("resp_2", {"output": "two"})
        store.put("resp_3", {"output": "three"})
        store.put("resp_4", {"output": "four"})
        assert store.get("resp_1") is None  # evicted (least recently used)
        assert store.get("resp_2") is not None
        assert len(store) == 3

Invariant defended: LRU cache correctness (response chaining via previous_response_id).

173 gateway test files defend:

  • Session routing across platforms (Telegram, Discord, Slack, Matrix, API server)
  • Message queueing during agent busy state
  • Approval/deny workflow authorization
  • Multi-platform skill registration

4b. tools/ (1.10× coverage)

149 files test tool execution safety and skill mutation.

Example: tests/tools/test_skill_improvements.py:46-68 (fuzzy patch skill):

def test_whitespace_trimmed_match(self):
    skill = "---\nname: ws-skill\n\n    def hello():\n        print(\"hi\")"
    _create_skill("ws-skill", skill)
    # Patch with no leading whitespace (LLM output shape)
    result = _patch_skill("ws-skill", "def hello():\n    print(\"hi\")", 
                          "def hello():\n    print(\"hello world\")")
    assert result["success"] is True
    content = (self.skills_dir / "ws-skill" / "SKILL.md").read_text()
    assert 'print("hello world")' in content

Invariant defended: Skill mutation is whitespace-agnostic (LLMs produce indentation variance).

Key test areas:

  • Skill creation/patching with fuzzy matching (7 test files)
  • Code execution modes (POSIX-only, Windows workarounds)
  • File sync performance over SSH
  • Security: symlink traversal, OSV package checks, hidden directory traversal
  • MCP OAuth token refresh and cold-load cache expiry

4c. run_agent (1.45× coverage)

55 files test agent loop orchestration and message lifecycle.

Example: tests/test_hermes_state.py:24-47 (session CRUD):

def test_end_session_preserves_original_end_reason(self, db):
    """First end_reason wins — compression split must not be overwritten."""
    db.create_session(session_id="s1", source="cli")
    db.end_session("s1", end_reason="compression")
    first_ended_at = db.get_session("s1")["ended_at"]
    
    # Stale CLI holds old session_id and calls end_session() again
    time.sleep(0.01)
    db.end_session("s1", end_reason="resumed_other")
    
    session = db.get_session("s1")
    assert session["end_reason"] == "compression"  # First win, not overwritten
    assert session["ended_at"] == first_ended_at

Invariant defended: Session end reason is idempotent (prevents re-compression of already-compressed sessions).

Key test areas:

  • SessionDB SQLite CRUD, FTS5 search, export
  • Message append, tool_call_count increments
  • Token count accumulation
  • Context compression lifecycle

5. What They Test LIGHTLY or NOT AT ALL

Undertested Areas:

Skills auto-creation path: tests/skills/ has only 6 tests (telephony, memento cards, YouTube quiz, Google OAuth). No test for the core create-from-experience path that the framework markets as "self-improving". Skill fuzzy patching is tested, but not the full discovery→execution→patch→learn loop.

Gateway multi-platform routing: 173 gateway tests but no parametrized routing tests across >5 platforms. No test verifies message bounces correctly between Telegram→Discord→Matrix with same user. No test for platform fallback if one adapter disconnects.

RL training pipeline (tinker-atropos): Only tests/tools/test_rl_training_tool.py (120 LOC) tests file handle cleanup and process termination. No test for actual RL training, reward signal, policy gradient, or convergence. The tinker-atropos/ directory has 0 test files in the main codebase (tinker-atropos is vendored, not tested end-to-end).

Evidence:

  • /tests/skills/ — 6 test files, 2.1K LOC vs 7.2K source LOC (0.29× ratio)
  • /tests/integration/ — no RL training, only batch runner + daytona/modal + voice
  • /tinker-atropos/ — 0 test files (directory exists for inference only)
  • /tests/e2e/ — 1 file, command dispatch only, no skill→agent loop

CLI parsing: Only basic smoke tests. No parametrized testing of 400+ CLI flags. No property-based fuzz of config YAML malformations.

Security testing gaps:

  • No prompt injection tests except test_cron_prompt_injection.py (cron-specific)
  • No test for tool hallucination (agent asks for non-existent tool)
  • No fuzzing of LLM output parsing (malformed JSON responses)
  • Only test_sql_injection.py (column name + query parameterization)
  • Only test_worktree_security.py + test_symlink_prefix_confusion.py (filesystem only)

6. Test Quality Signals

Integration Tests

  • tests/integration/ (8 files): batch runner checkpointing, daytona/modal terminal backends, voice channels, web tool interactions
  • No end-to-end agent task (e.g., "write code, test, refine" loop)
  • No cross-platform skill sharing test (skill created in Telegram, used in Discord)

Property-Based Testing

  • Not used. No hypothesis. No QuickCheck-style generators.
  • All tests use concrete fixtures and exhaustive case enumeration

Adversarial/Security Testing

  • Limited. Only regression tests, not proactive attack surface:
    • test_cron_prompt_injection.py — regex fuzzing for bypass patterns
    • test_sql_injection.py — assertion-only (no execution attack, only static checks)
    • test_tirith_security.py — Tirith-framework vulnerability scanning (external)
    • test_worktree_security.py — no Git command injection test, only symlink escape

Time-Sensitive Tests

  • test_timezone.py — 15.7K LOC of deterministic timezone/locale edge cases
  • test_hermes_logging.py — handler lifecycle with mock clock (no freezegun)
  • test_hermes_state.py — uses time.sleep(0.01) for idempotency checks
  • No use of freezegun or pytest-freezegun

Flaky Test Patches

  • 34 @pytest.mark.skipif markers (platform/privilege checks)
  • 0 @pytest.mark.flaky (no retries configured)
  • CI timeout 20min (suggests some slow tests, no flake reporting)

Flake evidence:

  • tests/honcho_plugin/test_client.py — 3 skipif markers (asyncio event loop issues)
  • tests/tools/test_ssh_environment.py — entire file conditionally skipped if SSH key unavailable
  • No recent history of "flaky test: reverted" commits in logs

7. The Philosophy of Their Test Suite

What the test suite reveals about trust in stochastic behavior:

The hermes-agent testing philosophy trades breadth of LLM output validation for depth of system invariants.

They Test Heavily:

  • Deterministic system paths — session lifecycle, message queueing, skill mutation, API routing
  • Mocked LLM calls — no real inference, only mock adapter responses
  • Configuration binding — 10K+ LOC of config parsing, default handling, precedence
  • Platform adapters — mock Telegram/Discord at the Python module level, assert send/receive correctness

They Don't Test (Implicitly Trust LLM):

  • Tool use correctness — no assertion that Claude actually uses the web_search tool when asked; only that the tool executes
  • Prompt quality — no tests that check agent reasoning, only that it doesn't crash
  • Context window management — compression is tested for correctness, but no test validates that it preserves semantic meaning

Core Assumption:

"If the system harness works correctly and the LLM is called with the right tools/prompt, the LLM will do the right thing."

This is pragmatic for an agentic framework — the framework can't predict LLM output, so it tests:

  1. Does the harness handle LLM stochasticity gracefully? (retry logic, fallback, error classification)
  2. Does the harness correctly isolate bad outputs? (tool sandbox, skill safety checks, prompt injection blocking)
  3. Do the system components compose correctly? (adapter → session → agent loop → storage)

Example of this philosophy in action:

  • test_hermes_state.py tests that session IDs survive compression — NOT that compression preserves conversation semantics (which only the LLM can judge)
  • tests/gateway/test_api_server.py tests that /v1/chat/completions returns 200 with correct schema — NOT that the response is sensible
  • test_skill_improvements.py tests that patches apply successfully — NOT that the patched skill works better than the original (depends on LLM feedback)

The suite is prescriptive (system must behave this way) but not proscriptive (we don't guard against bad LLM choices). This is the right engineering trade-off for an LLM-based system where the LLM is the source of truth.

hermes-agent Learning Index

Source

Explorations

2026-04-20 1946 (default · 3 agents)

  • Architecture — structure, entry points, core abstractions, design decisions
  • Code Snippets — 10 illustrative snippets (agent loop, tools, prompts, providers)
  • Quick Reference — install, quickstart, config, gotchas

Key insights:

  • Hermes is a self-improving agent with closed learning loops — Atropos RL feeds back into the agent via tinker-atropos submodule.
  • Provider-agnostic core: single abstraction over Anthropic / Bedrock / Gemini / OpenAI-compatible (~7+ providers).
  • Security-first prompt assembly scans context for injection patterns before handing off to the LLM.
  • Extensibility via Skills-as-Markdown (procedural memory), pluggable Tools registry, and MCP support.
  • SQLite + FTS5 state store backs memory and context compression.

2026-04-20 2218 (deep · 3 agents — philosophy lens)

  • Issues & Commits — release rhythm, commit style, argued themes, stated vs revealed philosophy
  • Testing — tests/ structure, coverage gaps, what they trust LLM to do vs what they assert
  • API Surface — CLI / MCP / ACP / gateway / plugin / tool / skill / webhook / cron / Python-embed

Key insights from the deep run:

  • Release rhythm: every ~4 days. Changelogs cite PRs, not marketing copy.
  • Commit discipline: strict conventional commits (type(scope): what), atomic, no narrative — code is the spec; discussions live in PRs.
  • What they argue about: provider compatibility (8+ bugs), gateway reliability (6), skill autonomy (5), config/onboarding (4), session search (3).
  • Tests are prescriptive, not proscriptive — they assert the harness works, they trust the LLM to use it correctly. Skills auto-creation loop and cross-platform routing have thin coverage; tinker-atropos has zero test files.
  • Python embedding is unsupported — hermes is a daemon/CLI, not a library. Use ACP or subprocess.
  • The one claim only hermes makes: "The only agent with a closed learning loop that works across all platforms and all models simultaneously, without lock-in."
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment