Date: 2026-01-27 Reviewer: Claude Opus 4.5 (with Codex and Claude partner reviews) Scope: Full skill directory — SKILL.md, 5 examples, 8 references, 2 scripts
The agents-sdk-creator skill is a comprehensive, well-structured guide for building Python applications with the Claude Agent SDK (claude-agent-sdk). It covers the full SDK surface area across 465 lines of main workflow, 5 graduated examples, 8 detailed references, and 2 validation scripts. Cross-referencing against the official Anthropic documentation at platform.claude.com confirms the skill's core API claims are accurate: the query() vs ClaudeSDKClient distinction, message types, hook events, permission modes, custom tools, and anti-patterns all match.
The skill's strengths are its progressive disclosure (minimal to complete examples), security-first design (hooks, permissions, sandbox anti-patterns), and the automated validation script. The main areas for improvement are: verifying the UserMessage.uuid checkpointing claim against runtime behavior, hardening the bash-based validator, covering a few missing SDK features (receive_messages(), interrupt(), McpHttpServerConfig), and adding mechanisms to stay current as the SDK evolves.
Overall quality: High. The skill is production-ready with targeted improvements.
Correct:
query()vsClaudeSDKClientfeature matrix matches the official Python reference exactly- All 5 message types and 4 content block types match official docs
- 6 Python hook events and 3 TypeScript-only events correctly identified
- 4 permission modes accurately described
@tooldecorator,create_sdk_mcp_server(), andmcp__<server>__<tool>naming convention all correct- Anti-patterns (
breakinreceive_response(), deprecatedclaude_code_sdk,bypassPermissions+allowUnsandboxedCommands) all confirmed by official docs
Discrepancies found:
- The official SDK overview page shows hooks passed to
query()in its "Hooks" tab example, contradicting the Python reference which says hooks areClaudeSDKClient-only. The skill follows the Python reference (correct behavior), but users may encounter this inconsistency in Anthropic's own docs. UserMessage.uuidis used for checkpointing but is not explicitly documented in the officialUserMessagetype definition. The SDK is alpha (0.1.x) and docs may lag runtime. Needs runtime verification.
The 7-phase workflow (Requirements -> Setup -> Research -> Build -> Hooks/Permissions -> Subagents -> Validation) is thorough and well-sequenced. Each phase builds on the previous, and the skill correctly makes Phase 3 (Research) optional for simple agents.
Codex's feedback: The phases are "verbose but purposeful." A "fast track" path that jumps to Phase 4 with reminders to revisit hooks/subagents later would serve experienced users. This is a valid observation — the skill optimizes for correctness over speed.
The 5 examples form a clear progression:
| Example | Lines | API | Complexity |
|---|---|---|---|
| minimal-agent | ~20 | query() |
Simplest possible |
| standard-agent | ~55 | query() |
Error handling, budget, message processing |
| multi-turn-agent | ~60 | ClaudeSDKClient |
REPL, interrupt, streaming |
| complete-agent | ~214 | ClaudeSDKClient |
Custom tools, hooks, subagents, structured output |
| checkpoint-agent | ~80 | ClaudeSDKClient |
File checkpointing, try/rollback |
Each example is self-contained and runnable. The checkpoint-agent uses the uv shebang pattern from CLAUDE.md. The complete-agent demonstrates every major feature in a single coherent script.
Gap: No example shows receive_messages() (all examples use receive_response()). No example demonstrates interrupt() programmatically (the multi-turn example shows it via user input but not as an automated pattern).
The 8 reference files total ~2,260 lines of documentation covering:
- api-reference.md (434 lines): Complete type definitions, method signatures, options fields. High quality. Missing
permission_prompt_tool_namefield andMcpHttpServerConfigtype. - patterns-guide.md (529 lines): 12 patterns from one-shot to checkpointing, plus anti-patterns. Most comprehensive file. Missing an event-driven/background-watcher pattern.
- hooks-and-permissions.md (330 lines): Permission evaluation order, hook events, callback signatures, 4 common patterns (security, audit, redirect, rate limiting). Excellent clarity.
- tools-reference.md (195 lines): Built-in tools table, custom tool creation, external MCP servers. Solid coverage.
- subagents-reference.md (241 lines): Constraints, invocation patterns, multi-agent architectures. Good coverage of edge cases.
- sessions-reference.md (170 lines): Session capture, resume, fork, pipeline, ClaudeSDKClient continuity. Clear and practical.
- structured-outputs.md (183 lines): Pydantic integration, schema design tips, complex examples. Well done.
- sandbox-reference.md (174 lines): Security considerations,
excludedCommandsvsallowUnsandboxedCommandscomparison table. Strong security focus.
detect-sdk-context.sh (118 lines): Checks for existing agent scripts, project dependencies, and MCP config. Includes a safety check preventing accidental creation in ~/.claude/plugins/cache/. Simple but effective.
validate-agent-script.sh (372 lines): Comprehensive bash/grep-based validator checking 20+ conditions. Catches real mistakes (deprecated package, hooks with query(), missing enable_file_checkpointing, Task in subagent tools).
Weakness: The grep-based approach is brittle. It can miss multiline arguments, aliased imports, and conditionally defined code. It can also false-positive on break statements in unrelated loops. Both Codex and the Claude reviewer flagged this.
The official Anthropic quickstart (platform.claude.com/docs/en/agent-sdk/quickstart) walks through building a single bug-fixing agent. The skill goes far beyond this with 5 graduated examples, 12 patterns, and comprehensive reference docs. The skill is a superset of the official getting-started material.
The SDK is alpha (0.1.x) and evolving. The skill has no mechanism to detect or flag when it falls behind. New hook events, permission modes, options fields, or API changes could silently make parts of the skill incorrect.
Description: The checkpoint-agent example and patterns-guide Pattern 12 both rely on UserMessage.uuid to capture checkpoint restore points. However, the official Python SDK docs do not list uuid as a field on UserMessage. The rewind_files(user_message_uuid) method exists, implying the UUID comes from somewhere, but the docs don't say where.
Why it matters: If UserMessage.uuid doesn't exist at runtime, the entire file checkpointing workflow is broken. This is the skill's highest-risk claim.
Suggested approach:
- Create a minimal test script that creates a
ClaudeSDKClientwithenable_file_checkpointing=True - Send a query and iterate
receive_response(), printingtype(msg),dir(msg), andmsgfor eachUserMessage - If
uuidexists: document it as an undocumented but functional field with a note - If
uuiddoesn't exist: investigate where the checkpoint UUID comes from (possiblySystemMessage.data,AssistantMessagemetadata, or a different API) and update all checkpoint examples
Files affected: examples/checkpoint-agent.md, references/patterns-guide.md (Pattern 12), SKILL.md (Phase 4 checkpointing note)
Notes and status: ready
Description: The skill's build phase and all 5 examples exclusively use receive_response(). The API reference documents both receive_messages() and receive_response() as ClaudeSDKClient methods, but the workflow never explains when to choose one over the other. Similarly, interrupt() is mentioned in the multi-turn example's user input handling but never explained as a programmatic pattern.
Why it matters: Developers building streaming UIs, progress monitors, or cancellation-aware agents need to understand these methods. Codex specifically flagged this gap.
Suggested approach:
- Add a brief subsection to SKILL.md Phase 4 (or a callout box) contrasting the two methods:
receive_response(): yields messages untilResultMessage— use for standard workflowsreceive_messages(): yields ALL messages including from subagents — use when you need full visibility into subagent activity
- Add a note on
interrupt(): "Callawait client.interrupt()to stop the current task mid-execution. The client remains usable — send a newquery()to continue." - Consider adding a short example pattern (Pattern 13) showing programmatic interrupt with timeout
Files affected: SKILL.md (Phase 4), optionally references/patterns-guide.md
Notes and status: ready
Description: Three types present in the official docs are missing from the skill's api-reference.md:
McpHttpServerConfig— HTTP-based MCP server connection typepermission_prompt_tool_namefield onClaudeAgentOptions— controls which tool name appears in permission promptsCLIConnectionError— intermediate error class betweenClaudeSDKErrorandCLINotFoundError
Why it matters: Users consulting the api-reference as their primary SDK docs will have an incomplete picture of available configuration options and error handling.
Suggested approach:
- Add
McpHttpServerConfigto the MCP Server Types section inapi-reference.md - Add
permission_prompt_tool_name: str | None = Noneto theClaudeAgentOptionsfield listing - Add
CLIConnectionErrorto the error types section (already partially documented sinceCLINotFoundErrorinherits from it)
Files affected: references/api-reference.md, references/tools-reference.md (MCP server section)
Notes and status: ready
Description: The skill says setting_sources=["project"] loads CLAUDE.md, but the relationship between setting_sources and system_prompt preset is underspecified. The official docs indicate that CLAUDE.md content is injected as part of the Claude Code system prompt preset, meaning both settings must work together.
Why it matters: A user who sets setting_sources=["project"] without the claude_code preset (or vice versa) will not get the expected behavior and won't understand why.
Suggested approach:
- In SKILL.md Phase 4 "With project context" section, add an explicit note: "Both
setting_sources=["project"]ANDsystem_prompt={"type": "preset", "preset": "claude_code"}are needed to load CLAUDE.md instructions into the agent." - In
references/sessions-reference.mdorreferences/api-reference.md, clarify the dependency - Update the validation script to check for
setting_sourceswithout the preset (and vice versa) as a warning
Files affected: SKILL.md, references/api-reference.md, scripts/validate-agent-script.sh
Notes and status: ready
Description: The validate-agent-script.sh (372 lines) uses bash grep and echo | grep patterns to detect imports, decorator usage, dict literals, and anti-patterns. This is brittle against multiline code, aliased imports, conditional definitions, and string content that happens to match patterns.
Why it matters: False negatives give a false sense of correctness. False positives erode trust in the validator. Both Codex and the Claude reviewer flagged this. Specific known issues:
breakdetection can false-positive onbreakin unrelated loops (e.g., aforloop processing items)- Multiline
allowed_toolslists may not be detected - Aliased imports (
from claude_agent_sdk import query as q) evade detection
Suggested approach:
- Keep the bash script as a fast "lint" pass for quick sanity checks
- Create a companion Python script (
validate_agent_ast.py) that uses theastmodule to:- Parse the file into an AST
- Walk imports to verify
claude_agent_sdk(notclaude_code_sdk) - Find
async forloops and check forbreakwithin loops that iterate overreceive_response() - Check decorator usage (
@tool) and verifycreate_sdk_mcp_serverpresence - Detect
ClaudeAgentOptionskeyword arguments for hooks/tools/permissions analysis
- Update the validation phase (Phase 7) to recommend the Python validator as primary, bash as fallback
Files affected: New file scripts/validate_agent_ast.py, scripts/validate-agent-script.sh (keep as-is), SKILL.md (Phase 7)
Notes and status: ready
Description: The 7-phase workflow is thorough but can feel heavy for experienced developers building simple agents. Codex noted it's "verbose but purposeful" and suggested a fast-track option.
Why it matters: Users who already know what they want (e.g., "build a one-shot code reviewer with structured output") shouldn't need to walk through requirements gathering and research phases.
Suggested approach:
- Add a "Fast Track" section near the top of SKILL.md, after the Overview, with a decision flowchart:
- "Need one-shot task? -> See minimal-agent example, skip to Phase 4"
- "Need multi-turn? -> See multi-turn-agent example, skip to Phase 4"
- "Need hooks/custom tools? -> See complete-agent example, start at Phase 4"
- "Complex/unfamiliar? -> Follow all 7 phases"
- Each fast-track path links directly to the relevant example and notes which phases to revisit (e.g., "After building, review Phase 5 for security hooks and Phase 7 for validation")
Files affected: SKILL.md
Notes and status: ready
Description: The 12 patterns cover one-shot, interactive, multi-agent, session, sandbox, and checkpoint workflows. Missing is an event-driven pattern where an agent watches for external triggers (file changes, queue messages, webhooks) and reacts autonomously.
Why it matters: CI/CD listeners, deployment watchers, PR review bots, and monitoring agents are common real-world use cases. Codex specifically identified this gap.
Suggested approach:
- Add Pattern 13 to
references/patterns-guide.md: "Event-Driven Automation Agent" - The pattern should show:
- An outer event loop (e.g., watching a directory, polling an API, reading from a queue)
- Creating a new
query()call orClaudeSDKClientsession for each event - Cost tracking across events with cumulative budget enforcement
- Graceful shutdown handling
- Example use case: Watch a directory for new
.pyfiles and run lint + fix on each one
Files affected: references/patterns-guide.md
Notes and status: ready
Description: The skill has no mechanism to detect or flag when it falls behind the SDK. The SDK is alpha and evolving — new hook events, options fields, or API changes could silently make parts of the skill incorrect.
Why it matters: Stale documentation that looks authoritative is worse than no documentation. Users will follow outdated patterns and get confused when they don't work.
Suggested approach:
- Add a
<!-- sdk-version: X.Y.Z -->comment at the top of SKILL.md and api-reference.md indicating the SDK version the docs were last verified against - Add a "Last verified" line in the SKILL.md header: "Last verified against: claude-agent-sdk 0.1.x (2026-01-27)"
- Create a simple script (
scripts/check-sdk-version.sh) that:- Runs
pip show claude-agent-sdkto get the installed version - Compares against the documented version
- Warns if they differ
- Runs
- Document a maintenance cadence: "Review this skill against SDK changelog when the major or minor version changes"
Files affected: SKILL.md, references/api-reference.md, new file scripts/check-sdk-version.sh
Notes and status: ready
Description: The official SDK overview mentions branding guidelines: users can say "Claude Agent" or "{YourAgentName} Powered by Claude" but should NOT say "Claude Code" or "Claude Code Agent" when naming their agents. The skill doesn't mention this.
Why it matters: Developers building production agents need to know the branding constraints to avoid naming issues. This is a simple addition.
Suggested approach:
- Add a brief note at the end of SKILL.md Phase 1 (Requirements Gathering) or Phase 4 (Build):
- "When naming your agent, follow Anthropic's branding guidelines: use 'Claude Agent', 'Claude', or '{YourAgentName} Powered by Claude'. Do not use 'Claude Code' or 'Claude Code Agent' in agent names."
- Link to the official overview for full guidelines
Files affected: SKILL.md
Notes and status: ready
Description: The validation script's break detection is a known false-positive risk. It checks for any break statement in a file that also contains receive_response, which will flag break in completely unrelated loops.
Why it matters: False positives reduce trust in the validator and train users to ignore its warnings. This is a quick targeted fix independent of the larger AST validator effort (Item 5).
Suggested approach:
- In
validate-agent-script.sh, replace the current heuristic:With a more targeted check that looks for# Current (line ~350): checks for ANY break + ANY receive_response in same file if echo "$CONTENT" | grep -qE '^\s+break\s*$' && echo "$CONTENT" | grep -qE 'async\s+for.*receive_response'; then
breakwithin 20 lines after anasync for.*receive_responseline:# Better: check for break within the body of a receive_response loop if echo "$CONTENT" | grep -n 'async\s+for.*receive_response' | while read line_info; do line_num=$(echo "$line_info" | cut -d: -f1) end_line=$((line_num + 30)) echo "$CONTENT" | sed -n "${line_num},${end_line}p" | grep -qE '^\s+break\s*$' && exit 0 done; then
- This is still imperfect (the AST validator in Item 5 is the proper fix) but reduces false positives significantly
Files affected: scripts/validate-agent-script.sh
Notes and status: ready
| # | Item | Priority | Effort |
|---|---|---|---|
| 1 | Verify UserMessage.uuid for checkpointing |
Critical | Small |
| 2 | Document receive_messages() vs receive_response() and interrupt() |
High | Small |
| 3 | Add missing API types (McpHttpServerConfig, permission_prompt_tool_name, CLIConnectionError) |
Medium | Small |
| 4 | Clarify CLAUDE.md loading requirements | Medium | Small |
| 5 | Harden validation with Python AST script | Medium | Medium |
| 6 | Add fast-track workflow path | Medium | Small |
| 7 | Add event-driven automation pattern | Low | Medium |
| 8 | Add SDK version tracking / staleness prevention | Low | Small |
| 9 | Add branding guidelines note | Low | Trivial |
| 10 | Improve break detection in validate script |
Low | Small |
- Codex (via pairctl, chat
12fd57ee): Flaggedreceive_messages()/interrupt()gap, validator brittleness, suggested fast-track path and event-driven pattern - Claude (via pairctl): Verified all 9 API claim categories, flagged
UserMessage.uuidrisk,McpHttpServerConfiggap, and CLAUDE.md loading clarification - Official docs:
platform.claude.com/docs/en/agent-sdk/overview,/python,/quickstart— fetched and cross-referenced 2026-01-27