Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save possibilities/b3d4a6a57afda2574604a703ef2dbbeb to your computer and use it in GitHub Desktop.

Select an option

Save possibilities/b3d4a6a57afda2574604a703ef2dbbeb to your computer and use it in GitHub Desktop.
Deep review: plugins/claude/agents-sdk-creator (2026-01-27)

Deep Review: plugins/claude/agents-sdk-creator

Date: 2026-01-27 Reviewer: Claude Opus 4.5 (with Codex and Claude partner reviews) Scope: Full skill directory — SKILL.md, 5 examples, 8 references, 2 scripts


Executive Summary

The agents-sdk-creator skill is a comprehensive, well-structured guide for building Python applications with the Claude Agent SDK (claude-agent-sdk). It covers the full SDK surface area across 465 lines of main workflow, 5 graduated examples, 8 detailed references, and 2 validation scripts. Cross-referencing against the official Anthropic documentation at platform.claude.com confirms the skill's core API claims are accurate: the query() vs ClaudeSDKClient distinction, message types, hook events, permission modes, custom tools, and anti-patterns all match.

The skill's strengths are its progressive disclosure (minimal to complete examples), security-first design (hooks, permissions, sandbox anti-patterns), and the automated validation script. The main areas for improvement are: verifying the UserMessage.uuid checkpointing claim against runtime behavior, hardening the bash-based validator, covering a few missing SDK features (receive_messages(), interrupt(), McpHttpServerConfig), and adding mechanisms to stay current as the SDK evolves.

Overall quality: High. The skill is production-ready with targeted improvements.


Detailed Findings

1. API Accuracy (verified against official docs)

Correct:

  • query() vs ClaudeSDKClient feature matrix matches the official Python reference exactly
  • All 5 message types and 4 content block types match official docs
  • 6 Python hook events and 3 TypeScript-only events correctly identified
  • 4 permission modes accurately described
  • @tool decorator, create_sdk_mcp_server(), and mcp__<server>__<tool> naming convention all correct
  • Anti-patterns (break in receive_response(), deprecated claude_code_sdk, bypassPermissions + allowUnsandboxedCommands) all confirmed by official docs

Discrepancies found:

  • The official SDK overview page shows hooks passed to query() in its "Hooks" tab example, contradicting the Python reference which says hooks are ClaudeSDKClient-only. The skill follows the Python reference (correct behavior), but users may encounter this inconsistency in Anthropic's own docs.
  • UserMessage.uuid is used for checkpointing but is not explicitly documented in the official UserMessage type definition. The SDK is alpha (0.1.x) and docs may lag runtime. Needs runtime verification.

2. Workflow Design

The 7-phase workflow (Requirements -> Setup -> Research -> Build -> Hooks/Permissions -> Subagents -> Validation) is thorough and well-sequenced. Each phase builds on the previous, and the skill correctly makes Phase 3 (Research) optional for simple agents.

Codex's feedback: The phases are "verbose but purposeful." A "fast track" path that jumps to Phase 4 with reminders to revisit hooks/subagents later would serve experienced users. This is a valid observation — the skill optimizes for correctness over speed.

3. Examples Quality

The 5 examples form a clear progression:

Example Lines API Complexity
minimal-agent ~20 query() Simplest possible
standard-agent ~55 query() Error handling, budget, message processing
multi-turn-agent ~60 ClaudeSDKClient REPL, interrupt, streaming
complete-agent ~214 ClaudeSDKClient Custom tools, hooks, subagents, structured output
checkpoint-agent ~80 ClaudeSDKClient File checkpointing, try/rollback

Each example is self-contained and runnable. The checkpoint-agent uses the uv shebang pattern from CLAUDE.md. The complete-agent demonstrates every major feature in a single coherent script.

Gap: No example shows receive_messages() (all examples use receive_response()). No example demonstrates interrupt() programmatically (the multi-turn example shows it via user input but not as an automated pattern).

4. References Quality

The 8 reference files total ~2,260 lines of documentation covering:

  • api-reference.md (434 lines): Complete type definitions, method signatures, options fields. High quality. Missing permission_prompt_tool_name field and McpHttpServerConfig type.
  • patterns-guide.md (529 lines): 12 patterns from one-shot to checkpointing, plus anti-patterns. Most comprehensive file. Missing an event-driven/background-watcher pattern.
  • hooks-and-permissions.md (330 lines): Permission evaluation order, hook events, callback signatures, 4 common patterns (security, audit, redirect, rate limiting). Excellent clarity.
  • tools-reference.md (195 lines): Built-in tools table, custom tool creation, external MCP servers. Solid coverage.
  • subagents-reference.md (241 lines): Constraints, invocation patterns, multi-agent architectures. Good coverage of edge cases.
  • sessions-reference.md (170 lines): Session capture, resume, fork, pipeline, ClaudeSDKClient continuity. Clear and practical.
  • structured-outputs.md (183 lines): Pydantic integration, schema design tips, complex examples. Well done.
  • sandbox-reference.md (174 lines): Security considerations, excludedCommands vs allowUnsandboxedCommands comparison table. Strong security focus.

5. Validation Scripts

detect-sdk-context.sh (118 lines): Checks for existing agent scripts, project dependencies, and MCP config. Includes a safety check preventing accidental creation in ~/.claude/plugins/cache/. Simple but effective.

validate-agent-script.sh (372 lines): Comprehensive bash/grep-based validator checking 20+ conditions. Catches real mistakes (deprecated package, hooks with query(), missing enable_file_checkpointing, Task in subagent tools).

Weakness: The grep-based approach is brittle. It can miss multiline arguments, aliased imports, and conditionally defined code. It can also false-positive on break statements in unrelated loops. Both Codex and the Claude reviewer flagged this.

6. Comparison to Official Quickstart

The official Anthropic quickstart (platform.claude.com/docs/en/agent-sdk/quickstart) walks through building a single bug-fixing agent. The skill goes far beyond this with 5 graduated examples, 12 patterns, and comprehensive reference docs. The skill is a superset of the official getting-started material.

7. Staleness Risk

The SDK is alpha (0.1.x) and evolving. The skill has no mechanism to detect or flag when it falls behind. New hook events, permission modes, options fields, or API changes could silently make parts of the skill incorrect.


Actionable Items

Item 1: Verify UserMessage.uuid for checkpointing at runtime

Description: The checkpoint-agent example and patterns-guide Pattern 12 both rely on UserMessage.uuid to capture checkpoint restore points. However, the official Python SDK docs do not list uuid as a field on UserMessage. The rewind_files(user_message_uuid) method exists, implying the UUID comes from somewhere, but the docs don't say where.

Why it matters: If UserMessage.uuid doesn't exist at runtime, the entire file checkpointing workflow is broken. This is the skill's highest-risk claim.

Suggested approach:

  1. Create a minimal test script that creates a ClaudeSDKClient with enable_file_checkpointing=True
  2. Send a query and iterate receive_response(), printing type(msg), dir(msg), and msg for each UserMessage
  3. If uuid exists: document it as an undocumented but functional field with a note
  4. If uuid doesn't exist: investigate where the checkpoint UUID comes from (possibly SystemMessage.data, AssistantMessage metadata, or a different API) and update all checkpoint examples

Files affected: examples/checkpoint-agent.md, references/patterns-guide.md (Pattern 12), SKILL.md (Phase 4 checkpointing note)

Notes and status: ready


Item 2: Document receive_messages() vs receive_response() and interrupt()

Description: The skill's build phase and all 5 examples exclusively use receive_response(). The API reference documents both receive_messages() and receive_response() as ClaudeSDKClient methods, but the workflow never explains when to choose one over the other. Similarly, interrupt() is mentioned in the multi-turn example's user input handling but never explained as a programmatic pattern.

Why it matters: Developers building streaming UIs, progress monitors, or cancellation-aware agents need to understand these methods. Codex specifically flagged this gap.

Suggested approach:

  1. Add a brief subsection to SKILL.md Phase 4 (or a callout box) contrasting the two methods:
    • receive_response(): yields messages until ResultMessage — use for standard workflows
    • receive_messages(): yields ALL messages including from subagents — use when you need full visibility into subagent activity
  2. Add a note on interrupt(): "Call await client.interrupt() to stop the current task mid-execution. The client remains usable — send a new query() to continue."
  3. Consider adding a short example pattern (Pattern 13) showing programmatic interrupt with timeout

Files affected: SKILL.md (Phase 4), optionally references/patterns-guide.md

Notes and status: ready


Item 3: Add missing API types to reference

Description: Three types present in the official docs are missing from the skill's api-reference.md:

  • McpHttpServerConfig — HTTP-based MCP server connection type
  • permission_prompt_tool_name field on ClaudeAgentOptions — controls which tool name appears in permission prompts
  • CLIConnectionError — intermediate error class between ClaudeSDKError and CLINotFoundError

Why it matters: Users consulting the api-reference as their primary SDK docs will have an incomplete picture of available configuration options and error handling.

Suggested approach:

  1. Add McpHttpServerConfig to the MCP Server Types section in api-reference.md
  2. Add permission_prompt_tool_name: str | None = None to the ClaudeAgentOptions field listing
  3. Add CLIConnectionError to the error types section (already partially documented since CLINotFoundError inherits from it)

Files affected: references/api-reference.md, references/tools-reference.md (MCP server section)

Notes and status: ready


Item 4: Clarify CLAUDE.md loading requirements

Description: The skill says setting_sources=["project"] loads CLAUDE.md, but the relationship between setting_sources and system_prompt preset is underspecified. The official docs indicate that CLAUDE.md content is injected as part of the Claude Code system prompt preset, meaning both settings must work together.

Why it matters: A user who sets setting_sources=["project"] without the claude_code preset (or vice versa) will not get the expected behavior and won't understand why.

Suggested approach:

  1. In SKILL.md Phase 4 "With project context" section, add an explicit note: "Both setting_sources=["project"] AND system_prompt={"type": "preset", "preset": "claude_code"} are needed to load CLAUDE.md instructions into the agent."
  2. In references/sessions-reference.md or references/api-reference.md, clarify the dependency
  3. Update the validation script to check for setting_sources without the preset (and vice versa) as a warning

Files affected: SKILL.md, references/api-reference.md, scripts/validate-agent-script.sh

Notes and status: ready


Item 5: Harden validation script with Python AST analysis

Description: The validate-agent-script.sh (372 lines) uses bash grep and echo | grep patterns to detect imports, decorator usage, dict literals, and anti-patterns. This is brittle against multiline code, aliased imports, conditional definitions, and string content that happens to match patterns.

Why it matters: False negatives give a false sense of correctness. False positives erode trust in the validator. Both Codex and the Claude reviewer flagged this. Specific known issues:

  • break detection can false-positive on break in unrelated loops (e.g., a for loop processing items)
  • Multiline allowed_tools lists may not be detected
  • Aliased imports (from claude_agent_sdk import query as q) evade detection

Suggested approach:

  1. Keep the bash script as a fast "lint" pass for quick sanity checks
  2. Create a companion Python script (validate_agent_ast.py) that uses the ast module to:
    • Parse the file into an AST
    • Walk imports to verify claude_agent_sdk (not claude_code_sdk)
    • Find async for loops and check for break within loops that iterate over receive_response()
    • Check decorator usage (@tool) and verify create_sdk_mcp_server presence
    • Detect ClaudeAgentOptions keyword arguments for hooks/tools/permissions analysis
  3. Update the validation phase (Phase 7) to recommend the Python validator as primary, bash as fallback

Files affected: New file scripts/validate_agent_ast.py, scripts/validate-agent-script.sh (keep as-is), SKILL.md (Phase 7)

Notes and status: ready


Item 6: Add "fast track" workflow path for experienced users

Description: The 7-phase workflow is thorough but can feel heavy for experienced developers building simple agents. Codex noted it's "verbose but purposeful" and suggested a fast-track option.

Why it matters: Users who already know what they want (e.g., "build a one-shot code reviewer with structured output") shouldn't need to walk through requirements gathering and research phases.

Suggested approach:

  1. Add a "Fast Track" section near the top of SKILL.md, after the Overview, with a decision flowchart:
    • "Need one-shot task? -> See minimal-agent example, skip to Phase 4"
    • "Need multi-turn? -> See multi-turn-agent example, skip to Phase 4"
    • "Need hooks/custom tools? -> See complete-agent example, start at Phase 4"
    • "Complex/unfamiliar? -> Follow all 7 phases"
  2. Each fast-track path links directly to the relevant example and notes which phases to revisit (e.g., "After building, review Phase 5 for security hooks and Phase 7 for validation")

Files affected: SKILL.md

Notes and status: ready


Item 7: Add event-driven / background automation pattern

Description: The 12 patterns cover one-shot, interactive, multi-agent, session, sandbox, and checkpoint workflows. Missing is an event-driven pattern where an agent watches for external triggers (file changes, queue messages, webhooks) and reacts autonomously.

Why it matters: CI/CD listeners, deployment watchers, PR review bots, and monitoring agents are common real-world use cases. Codex specifically identified this gap.

Suggested approach:

  1. Add Pattern 13 to references/patterns-guide.md: "Event-Driven Automation Agent"
  2. The pattern should show:
    • An outer event loop (e.g., watching a directory, polling an API, reading from a queue)
    • Creating a new query() call or ClaudeSDKClient session for each event
    • Cost tracking across events with cumulative budget enforcement
    • Graceful shutdown handling
  3. Example use case: Watch a directory for new .py files and run lint + fix on each one

Files affected: references/patterns-guide.md

Notes and status: ready


Item 8: Add SDK version tracking and staleness prevention

Description: The skill has no mechanism to detect or flag when it falls behind the SDK. The SDK is alpha and evolving — new hook events, options fields, or API changes could silently make parts of the skill incorrect.

Why it matters: Stale documentation that looks authoritative is worse than no documentation. Users will follow outdated patterns and get confused when they don't work.

Suggested approach:

  1. Add a <!-- sdk-version: X.Y.Z --> comment at the top of SKILL.md and api-reference.md indicating the SDK version the docs were last verified against
  2. Add a "Last verified" line in the SKILL.md header: "Last verified against: claude-agent-sdk 0.1.x (2026-01-27)"
  3. Create a simple script (scripts/check-sdk-version.sh) that:
    • Runs pip show claude-agent-sdk to get the installed version
    • Compares against the documented version
    • Warns if they differ
  4. Document a maintenance cadence: "Review this skill against SDK changelog when the major or minor version changes"

Files affected: SKILL.md, references/api-reference.md, new file scripts/check-sdk-version.sh

Notes and status: ready


Item 9: Add branding guidelines note

Description: The official SDK overview mentions branding guidelines: users can say "Claude Agent" or "{YourAgentName} Powered by Claude" but should NOT say "Claude Code" or "Claude Code Agent" when naming their agents. The skill doesn't mention this.

Why it matters: Developers building production agents need to know the branding constraints to avoid naming issues. This is a simple addition.

Suggested approach:

  1. Add a brief note at the end of SKILL.md Phase 1 (Requirements Gathering) or Phase 4 (Build):
    • "When naming your agent, follow Anthropic's branding guidelines: use 'Claude Agent', 'Claude', or '{YourAgentName} Powered by Claude'. Do not use 'Claude Code' or 'Claude Code Agent' in agent names."
  2. Link to the official overview for full guidelines

Files affected: SKILL.md

Notes and status: ready


Item 10: Improve break detection in validate script

Description: The validation script's break detection is a known false-positive risk. It checks for any break statement in a file that also contains receive_response, which will flag break in completely unrelated loops.

Why it matters: False positives reduce trust in the validator and train users to ignore its warnings. This is a quick targeted fix independent of the larger AST validator effort (Item 5).

Suggested approach:

  1. In validate-agent-script.sh, replace the current heuristic:
    # Current (line ~350): checks for ANY break + ANY receive_response in same file
    if echo "$CONTENT" | grep -qE '^\s+break\s*$' && echo "$CONTENT" | grep -qE 'async\s+for.*receive_response'; then
    With a more targeted check that looks for break within 20 lines after an async for.*receive_response line:
    # Better: check for break within the body of a receive_response loop
    if echo "$CONTENT" | grep -n 'async\s+for.*receive_response' | while read line_info; do
        line_num=$(echo "$line_info" | cut -d: -f1)
        end_line=$((line_num + 30))
        echo "$CONTENT" | sed -n "${line_num},${end_line}p" | grep -qE '^\s+break\s*$' && exit 0
    done; then
  2. This is still imperfect (the AST validator in Item 5 is the proper fix) but reduces false positives significantly

Files affected: scripts/validate-agent-script.sh

Notes and status: ready


Summary

# Item Priority Effort
1 Verify UserMessage.uuid for checkpointing Critical Small
2 Document receive_messages() vs receive_response() and interrupt() High Small
3 Add missing API types (McpHttpServerConfig, permission_prompt_tool_name, CLIConnectionError) Medium Small
4 Clarify CLAUDE.md loading requirements Medium Small
5 Harden validation with Python AST script Medium Medium
6 Add fast-track workflow path Medium Small
7 Add event-driven automation pattern Low Medium
8 Add SDK version tracking / staleness prevention Low Small
9 Add branding guidelines note Low Trivial
10 Improve break detection in validate script Low Small

External Review Sources

  • Codex (via pairctl, chat 12fd57ee): Flagged receive_messages()/interrupt() gap, validator brittleness, suggested fast-track path and event-driven pattern
  • Claude (via pairctl): Verified all 9 API claim categories, flagged UserMessage.uuid risk, McpHttpServerConfig gap, and CLAUDE.md loading clarification
  • Official docs: platform.claude.com/docs/en/agent-sdk/overview, /python, /quickstart — fetched and cross-referenced 2026-01-27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment