Skip to content

Instantly share code, notes, and snippets.

@grahama1970
Last active June 19, 2025 17:20
Show Gist options
  • Select an option

  • Save grahama1970/d526dd4bbf5d392f3f56816512357f40 to your computer and use it in GitHub Desktop.

Select an option

Save grahama1970/d526dd4bbf5d392f3f56816512357f40 to your computer and use it in GitHub Desktop.
claude polling proof of concept

Background Verification with Polling POC

Objective

Prove that we can use a background Claude Code process to verify results and block task progression until verification completes.

Important

This is a self-contained POC. All required command files are in the commands/ subdirectory.

Working Directory

All tasks should be executed with the current working directory as: /home/graham/workspace/experiments/llm_call/src/llm_call/usage/docs/tasks/claude_poll_poc/

All file paths are relative to this directory unless otherwise specified.

Generated Files

All generated files (scripts, logs, outputs) should be created in this POC directory:

  • simple_add.py - Generated Python script
  • add_results.txt - Output from simple_add.py
  • verification_status.json - Status file from verification
  • final_answer.txt - Final output file
  • execution_summary.txt - Summary for Gemini
  • task_execution.log - All logging output
  • setup_logging.py - Initial log setup script
  • verify_script.py - Generated verification script
  • poll_script.py - Generated polling script
  • Any archived logs

Tasks

Task 1: Setup Environment

  • Read /home/graham/.claude/CLAUDE.md and /home/graham/workspace/experiments/llm_call/CLAUDE.md
  • Verify virtual environment is active (which python contains .venv)
  • To archive existing logs and initialize fresh logging, first create a Python script named setup_logging.py. This script will handle both archiving the old log and configuring the new one. Use the code snippets in commands/logging-setup.md to build this script.
  • Execute the setup script: python setup_logging.py
  • Log: "[TASK1] Environment setup complete."

Task 2: Create and Run Function

  • Write simple_add.py that adds 2 + 3
  • The script should NOT log its own source code (prevents recursive logging)
  • Execute the script and capture output
  • Verify add_results.txt contains "The sum of 2 and 3 is 5"

Task 3: Launch Background Verification and Wait for Completion

  • Create verification_status.json with initial content: {"status": "in-progress"}
  • Launch background verification:
    • Generate a new script named verify_script.py by following the instructions in commands/claude-verify.md with these parameters:
    • code_file=simple_add.py, result_file=add_results.txt, status_file=verification_status.json, log_file=task_execution.log, expected_result=5
    • Launch verify_script.py as a background process (non-blocking).
  • Wait for verification to complete:
    • Create a new script named poll_script.py. Use the full Python code provided in commands/claude-poll.md as its content, replacing the placeholder variables with these parameters:
    • status_file=verification_status.json, expected_status=pass, timeout=600, log_file=task_execution.log
    • Execute python poll_script.py and wait for it to finish. This script is designed to block until verification completes or times out.
    • After poll_script.py finishes, you MUST check its exit code. If the exit code is 0, log "[TASK3] Verification confirmed successfully." and proceed to Task 4. If the exit code is not 0 (e.g., 1 for timeout), you must stop all further tasks immediately and log the failure (e.g., '[EXECUTION_HALTED] Polling failed or timed out.').

Task 4: Continue After Verification

  • This task must only be executed if Task 3 completed with an exit code of 0.
  • Write "Paris" to final_answer.txt
  • Log: "[TASK4] Final answer written after successful verification"
  • This task proves sequential execution - only runs after polling completes

Task 5: Validate with Gemini

  • Create execution_summary.txt by reading the generated files and populating the following template:
    POC Execution Summary:
    - simple_add.py created: [yes/no]
    - add_results.txt contains: [file content]
    - verification_status.json status: [final status value from file]
    - final_answer.txt contains: [file content, or "not created" if Task 4 did not run]
    - All files were created sequentially: [Answer 'yes' only if Task 4 was successfully executed, otherwise answer 'no']
    
  • To simulate the Gemini call, create a new Python script for this task. Use the full Python code from commands/ask-gemini-flash.md as its content, then execute the script.

Success Criteria

  • All tasks execute in sequence (no parallel execution)
  • Task 4 only runs after Task 3's polling confirms verification passed
  • Gemini confirms all results are real

Claude Poll POC - Background Verification with Polling

This POC demonstrates how to use background Claude Code processes for verification with polling-based synchronization, ensuring sequential task execution without race conditions.

Key Learnings

  1. Declarative Task Structure: The main task file describes WHAT to do, while helper prompts contain HOW to do it.
  2. Self-Contained Execution: All files (inputs, outputs, logs) stay within the POC directory.
  3. Hybrid Template Strategy: Intentionally uses both dynamic code generation (from prompts) and parameterized scripts to balance flexibility with reliability.
  4. Synchronization Pattern: Background process + polling ensures proper sequencing without race conditions.
  5. Clear Logging: Structured logging enables debugging and verification by humans, agents, and models.
  6. Selective Verification: A two-tiered approach (inline verification for key steps, holistic audit for all) balances rigor with efficiency.

Directory Structure

claude_poll_poc/
β”œβ”€β”€ 001_claude_poll_verification_tasks.md # Main task list (declarative)
β”œβ”€β”€ README.md                             # This file
β”œβ”€β”€ commands/                             # Helper instruction templates
β”‚  β”œβ”€β”€ logging-setup.md                  # Loguru configuration guide
β”‚  β”œβ”€β”€ claude-verify.md                  # Background verification instructions
β”‚  β”œβ”€β”€ claude-poll.md                    # Polling script instructions
β”‚  └── ask-gemini-flash.md               # Validation instructions
β”œβ”€β”€ docs/                                 # Documentation and templates
β”‚  └── task_conversion_template.md       # Template for converting traditional task lists
└── [Generated files during execution]
  β”œβ”€β”€ setup_logging.py                    # Initial log setup script
  β”œβ”€β”€ simple_add.py                       # Generated Python script
  β”œβ”€β”€ add_results.txt                     # Output from simple_add.py
  β”œβ”€β”€ verify_script.py                    # Generated verification script
  β”œβ”€β”€ poll_script.py                      # Generated polling script
  β”œβ”€β”€ verification_status.json            # Status file (in-progress β†’ pass/fail)
  β”œβ”€β”€ final_answer.txt                    # Proof of sequential execution
  β”œβ”€β”€ execution_summary.txt               # Summary for validation
  └── task_execution.log                  # Complete execution log

Execution Flow

graph TD
  Start([Start]) --> Setup[Task 1: Setup Environment]
  Setup --> |Archives old logs| Create[Task 2: Create simple_add.py]
  Create --> |Executes script| Verify[Task 3a: Launch Background Verification]
   
  Verify --> |Non-blocking| Poll[Task 3b: Poll for Completion]
  Poll --> |Blocks until complete| Check{Verification<br/>Passed?}
   
  Check -->|Yes| Final[Task 4: Answer Question]
  Check -->|No| End([End - Failed])
   
  Final --> Validate[Task 5: Holistic Validation]
  Validate --> End2([End - Success])

  %% Styling for light/dark mode compatibility
  classDef default fill:#f9f9f9,stroke:#333,stroke-width:2px,color:#000
  classDef process fill:#e1f5fe,stroke:#01579b,stroke-width:2px,color:#000
  classDef decision fill:#fff3e0,stroke:#e65100,stroke-width:2px,color:#000
  classDef endpoint fill:#f3e5f5,stroke:#4a148c,stroke-width:2px,color:#000
   
  class Setup,Create,Verify,Poll,Final,Validate process
  class Check decision
  class Start,End,End2 endpoint
Loading

How It Works

1. Setup Environment (Task 1)

  • Reads CLAUDE.md files for context.
  • Verifies virtual environment is active.
  • Creates and runs a script to archive existing logs and initialize fresh logging.

2. Create and Run Function (Task 2)

  • Creates simple_add.py to perform a basic calculation.
  • Executes the script and verifies its output. This is a foundational, high-stakes task.

3. Background Verification with Polling (Task 3)

  • Launch Phase: Generates and launches a background script to verify the correctness and quality of simple_add.py. This is our transactional verification.
  • Poll Phase: The main process blocks and waits for the verification to complete by polling a status file. The workflow only proceeds if the verification result is "pass".

4. Answer Question & Prove Sequence (Task 4)

  • Only executes if the verification in Task 3 passed.
  • Answers a simple knowledge-based question ("What is the capital of France?").
  • Writing the answer to final_answer.txt proves that this step ran sequentially after the verification was complete. This is a low-complexity task that does not require its own inline verification.

5. Holistic Validation (Task 5)

  • Creates a summary of the entire execution.
  • Simulates a final validation call to an external model like Gemini.
  • This acts as a holistic audit, confirming the overall success and consistency of the entire task sequence by reviewing the final logs and artifacts.

Key Design Decisions

  1. A Hybrid Template Strategy ("Declarative Prompts" vs. "Declarative Code"): This POC intentionally uses two different types of instruction "templates" to balance flexibility with reliability.

    • Declarative Prompts (e.g., claude-verify.md): These are detailed specifications that instruct the AI to generate a script from scratch. This approach offers maximum flexibility, adaptability, and the potential for self-correction, making it ideal for complex, one-off, or evolving tasks.
    • Declarative Code (e.g., claude-poll.md): This is a pre-written, parameterized script. The AI's role is simply to substitute variables, not to author the logic. This approach is faster, cheaper, more secure, and 100% deterministic, making it ideal for common, high-frequency, or critical utility tasks.
  2. Why Background + Polling?: Mimics real-world async operations (like API calls or long-running jobs) while maintaining strict sequential control in the main task flow.

  3. Why Self-Contained?: Makes the POC portable, debuggable, and easy to understand by keeping all inputs, outputs, and logs in a single directory.

  4. Why Structured Logging?: Enables reliable parsing by different systems (humans, agents, verification models) for debugging and auditing.

  5. Why Selective Verification?: This POC demonstrates a two-tiered verification strategy to balance rigor with efficiency.

    • Transactional Verification (Task 3): High-stakes, complex tasks (like generating and running a script) are verified immediately using the background/poll pattern. This acts as a critical quality gate.
    • Holistic Validation (Task 5): Lower-complexity tasks (like Task 4's question-answering) do not require immediate verification. Their success is confirmed at the very end by the final audit, which reviews all evidence.

Success Criteria

  • βœ… All tasks execute in sequence (no parallel execution of tasks).
  • βœ… Task 4 only runs after the verification in Task 3 passes.
  • βœ… No race conditions between the verification and polling scripts.
  • βœ… Clear, parseable logs for debugging and auditing.
  • βœ… All files are contained within the POC directory.
  • βœ… The final external validation confirms the results are real and cqherent.

Usage

  1. Navigate to the POC directory:

    cd /home/graham/workspace/experiments/llm_call/src/llm_call/usage/docs/tasks/claude_poll_poc/
  2. Execute the tasks in 001_claude_poll_verification_tasks.md sequentially.

  3. Review generated files and task_execution.log for results.

Extension Points

This pattern can be extended for:

  • Multiple background workers with centralized status tracking.
  • Complex verification chains with dependencies.
  • Integration with real external validators (Gemini, GPT-4, etc.).
  • Distributed task execution with proper synchronization.

Task List Conversion

The docs/task_conversion_template.md file provides a template that AI models can use to convert traditional, informal task lists into the structured format demonstrated in this POC. This enables:

  • Consistent Structure: All task lists follow the same declarative pattern
  • Reusable Components: Common operations can reference helper templates
  • Clear Dependencies: Sequential requirements and exit conditions are explicit
  • Verification Integration: Tasks can include inline verification steps where needed

To convert a traditional task list:

  1. Provide the original task list to Claude or Gemini
  2. Reference the conversion template in docs/
  3. The AI will restructure it into the declarative format with proper helper references

Ask Gemini Flash Command

Ask Gemini Flash to verify execution results are real (not hallucinated).

Note: This is a simplified template for the POC. In production, use proper Gemini API integration.

Expected Output:

  • Provide clear, concise answers (limit ~500 tokens)
  • Use creative but accurate tone
  • For complex code tasks, suggest using ask-gemini-pro instead
  • If unable to answer, explain why clearly

POC Simplified Example:

# For this POC, simply log what would be sent to Gemini
from loguru import logger

# Remove default handler to prevent stderr output
logger.remove()
# Add only file handler
logger.add('task_execution.log', 
           rotation='1 MB', 
           enqueue=True, 
           backtrace=True, 
           diagnose=True,
           format="{time:YYYY-MM-DD HH:mm:ss.SSS} | {message}")

logger.info("=" * 80)
logger.info("πŸ€– GEMINI VALIDATION REQUEST")
logger.info("-" * 80)

# Read execution summary
with open('execution_summary.txt', 'r') as f:
    summary = f.read()

# Log what would be sent to Gemini
logger.info("[GEMINI_PROMPT]")
logger.info("Please verify if these results appear to be real execution outputs (not hallucinated):")
logger.info(summary)
logger.info("")
logger.info("Are these results consistent with actual code execution? Reply with YES or NO and a brief explanation.")

# In a real implementation, this would call Gemini API
# For POC, we just log that validation was requested
logger.info("[GEMINI_RESPONSE] (Simulated for POC)")
logger.info("YES - The results show consistent file creation with proper content:")
logger.info("1. simple_add.py was created and executed")
logger.info("2. add_results.txt contains the expected sum")
logger.info("3. verification_status.json shows successful validation")
logger.info("4. final_answer.txt was created after verification")
logger.info("5. Sequential execution is evidenced by the file creation order")

logger.info("βœ… GEMINI VALIDATION COMPLETE")
logger.info("=" * 80)

Model Information:

  • vertex_ai/gemini-1.5-flash: Fast, cost-efficient version of Gemini 1.5
  • Good for: Code generation, analysis, general questions, quick responses
  • Features: Multi-modal support, function calling, safety settings

Environment Requirements:

  • GOOGLE_APPLICATION_CREDENTIALS: Path to service account JSON file
  • VERTEX_PROJECT: Google Cloud project ID (optional, can be in credentials)
  • VERTEX_LOCATION: Region like 'us-central1' (optional)

Helpful Documentation:

Usage Examples:

  • /user:ask-gemini-flash Write a Python function to calculate fibonacci
  • /user:ask-gemini-flash Explain quantum computing in simple terms
  • /user:ask-gemini-flash Debug this code: [paste code]

Notes:

  • Ensure Google Cloud credentials are properly configured
  • Flash model is optimized for speed and cost-efficiency
  • For complex reasoning tasks, consider using gemini-1.5-pro instead

Claude Poll Command

Generate a Python polling script that waits for a background Claude Code process to write a JSON status file: $ARGUMENTS

Expected Output:

  • Generate a Python script that polls for a JSON status file based on $ARGUMENTS
  • Wait for specified file to exist, then parse and check status
  • Log all polling attempts and results to the specified log file
  • Exit with appropriate code when expected status is reached or timeout occurs
  • Use $ARGUMENTS to determine file name, expected status, timeout, log file, etc.

Code Example:

# Generate a Python polling script with clear logging
# Parameters extracted from $ARGUMENTS

import json
import time
import os
import sys
from datetime import datetime
from loguru import logger

# Extract parameters from command arguments
status_file = "$status_file"
expected_status = "$expected_status"
timeout = int("$timeout")
log_file = "$log_file"

# Configure logger with clear format
# Remove default handler to prevent stderr output
logger.remove()
# Add only file handler
logger.add(log_file, rotation="1 MB", enqueue=True, backtrace=True, diagnose=True,
           format="{time:YYYY-MM-DD HH:mm:ss.SSS} | {message}")

logger.info("πŸ”„ POLLING STARTED: Waiting for verification completion")
logger.info("-" * 80)

start_time = time.time()
attempt = 0

while True:
    attempt += 1
    current_time = datetime.now().strftime("%H:%M:%S.%f")[:-3]
    logger.info(f"[POLL_ATTEMPT] #{attempt} at {current_time}")
    
    # Check if file exists
    if os.path.exists(status_file):
        try:
            with open(status_file, 'r') as f:
                data = json.load(f)
            
            # Log the file content
            logger.info(f"[FILE_READ] {status_file}:")
            logger.info("```json")
            logger.info(json.dumps(data, indent=2))
            logger.info("```")
            
            # Check status
            status = data.get('status', 'unknown')
            
            if status == expected_status:
                logger.info(f"[POLL_RESULT] Status: {status} (SUCCESS!)")
                logger.info("βœ… POLLING COMPLETED: Verification passed")
                logger.info("=" * 80)
                sys.exit(0)
            else:
                logger.info(f"[POLL_RESULT] Status: {status} (waiting...)")
        
        except json.JSONDecodeError as e:
            logger.error(f"[ERROR] Invalid JSON in {status_file}: {e}")
        except Exception as e:
            logger.error(f"[ERROR] Failed to read status file: {e}")
    else:
        logger.info(f"[POLL_RESULT] File not found (waiting...)")
    
    # Check timeout
    elapsed = time.time() - start_time
    if elapsed > timeout:
        logger.error(f"[ERROR] Timeout after {timeout} seconds")
        logger.info("❌ POLLING FAILED: Timeout reached")
        logger.info("=" * 80)
        sys.exit(1)
    
    # Wait before next poll
    time.sleep(1)

Usage for Claude Code Background Tasks:

This command generates polling scripts specifically for Claude Code instances running in the background that write JSON status files when complete.

Common Status Values:

  • "in-progress": Background task is still running
  • "pass": Verification completed successfully
  • "fail": Verification failed
  • "completed": General completion status
  • "success": Alternative success status

Environment Requirements:

  • python3: Python 3.x interpreter
  • loguru: Python logging library (install with uv add loguru or pip install loguru)

Usage Examples:

  • /user:claude-poll status_file=verification_status.json expected_status=pass timeout=600 log_file=task_execution.log
  • /user:claude-poll status_file=verification_status.json expected_status=pass timeout=300 log_file=task_execution.log
  • /user:claude-poll status_file=completion.json expected_status=completed timeout=600 log_file=task_execution.log
  • /user:claude-poll status_file=background_verification.json expected_status=success timeout=300 log_file=task_execution.log

Notes:

  • Designed specifically for Claude Code background task polling
  • Simple, reliable approach without complex features
  • Validates JSON before parsing
  • Provides clear, structured logging for all polling attempts
  • Logs are parseable by agents, humans, and verification models
  • Timeout prevents infinite waiting on stuck Claude processes
  • 1-second polling interval for simplicity (no exponential backoff in this POC)

/user:claude-verify

Run a background verification of a Python script and its output file, logging every step and writing a structured JSON status file. This command uses Claude Code as a language model to critique the code quality, verify outputs, and suggest improvements, all driven by a flexible prompt with no reliance on the ast module or hardcoded scripts.


Usage

/user:claude-verify code_file=simple_add.py result_file=add_results.txt status_file=verification_status.json log_file=task_execution.log expected_result=5

Prompt Template

Replace the variables with the arguments provided to the slash command.

Verify the correctness of a Python script and its output, and critique the code quality as a language model with suggested improvements.

**Instructions:**
- Use Loguru for logging. At the top of the generated script, include:
  ```python
  from loguru import logger
  # Remove default handler to prevent stderr output
  logger.remove()
  # Add only file handler
  logger.add('$log_file', rotation="1 MB", enqueue=True, backtrace=True, diagnose=True,
             format="{time:YYYY-MM-DD HH:mm:ss.SSS} | {message}")
  • Log: "πŸ”¬ BACKGROUND CLAUDE VERIFICATION STARTED"
  • Log: "-" * 80 (sub-separator for clarity)
  • Read the Python file $code_file as raw text and critique its quality as a language model, focusing on:
    • Readability: Are variable names descriptive (e.g., avoid single-letter names like a or b)? Are comments present and clear?
    • Structure: Is the code logically organized? Is the function add_numbers clearly defined with appropriate inputs and outputs?
    • Error Handling: Are try-except blocks used for file operations or other risky actions?
    • Style: Does the code follow PEP 8 guidelines (e.g., indentation, line length under 79 characters)?
    • Correctness: Does the code appear to perform the intended task (add two numbers and write the result to $result_file)?
  • When reading the code file, log:
    logger.info("[CODE_ANALYSIS] Reading $code_file")
    logger.info("```python")
    logger.info(code_content)
    logger.info("```")
    
  • Run the script using python $code_file, capturing stdout, stderr, and return code with a 10-second timeout.
  • When executing, log:
    logger.info("[EXECUTION] Running python $code_file")
    logger.info("[EXECUTION_OUTPUT]")
    logger.info(f"stdout: {stdout}")
    logger.info(f"stderr: {stderr}")
    logger.info(f"return_code: {return_code}")
    
  • Read the output file $result_file and check if the result matches $expected_result.
  • When verifying output, log:
    logger.info("[FILE_VERIFICATION] Reading $result_file")
    logger.info("```")
    logger.info(file_content)
    logger.info("```")
    
  • Suggest specific code tweaks to address any issues found (e.g., add comments, improve variable names, add error handling). Format tweaks as code snippets for clarity.
  • Write a JSON status file $status_file with the following keys:
    • datetime: Current UTC timestamp in ISO format (e.g., "2025-06-18T15:37:00.123456Z").
    • critique: Detailed analysis of code quality and output verification.
    • tweaks: Suggested code improvements (code snippets or full code).
    • status: "pass" if the output in $result_file matches $expected_result and no errors occur, "fail" otherwise.
    • stdout: Output from running the code.
    • stderr: Any error messages.
  • Before writing the JSON, log it:
    logger.info("[JSON_UPDATE] Writing final verification status")
    logger.info("```json")
    logger.info(json.dumps(verification_data, indent=2))
    logger.info("```")
    
  • Write the JSON file atomically (use a temporary file and rename) to avoid race conditions with polling.
  • Log completion: "πŸ”¬ BACKGROUND VERIFICATION COMPLETE: [status]"
  • Log: "=" * 80 (section separator)

Example Generated Script Behavior:

  • Read $code_file and check for:
    • Presence of def add_numbers( to confirm the function exists.
    • Comments (e.g., # or ''') for readability.
    • Single-letter variable names (e.g., a, b) and suggest descriptive names (e.g., num1, num2).
    • Try-except blocks for file operations.
    • Line lengths exceeding 79 characters (PEP 8).
  • Run python $code_file and capture outputs.
  • Compare $result_file content with $expected_result.
  • Example critique: "Function add_numbers found. No comments found; consider adding explanatory comments. Single-letter variable names (a, b) reduce readability."
  • Example tweaks:
    Add comments:
    ```python
    # Add two numbers and return the sum
    def add_numbers(a, b):
    Use descriptive names:
    def add_numbers(num1, num2):
        return num1 + num2
  • Write $status_file atomically with critique, tweaks, status, stdout, and stderr.

Constraints:

  • Do not use the ast module or other parsing libraries for code analysis; rely on raw text analysis and language model capabilities.
  • Ensure all file paths and parameters ($code_file, $result_file, $status_file, $log_file, $expected_result) are substituted from the slash command.
  • Handle errors gracefully (e.g., file not found, invalid output) and include them in the critique.
  • Keep the generated script self-contained with no external dependencies beyond loguru, subprocess, json, datetime, sys, and os.

Logging Setup Helper

Standard loguru configuration for all scripts in this POC.

from loguru import logger
import datetime
import os

# Remove default handler to prevent stderr output
logger.remove()

# Add only file handler
logger.add('task_execution.log', 
           rotation='1 MB', 
           enqueue=True, 
           backtrace=True, 
           diagnose=True,
           format="{time:YYYY-MM-DD HH:mm:ss.SSS} | {message}")

For archiving old logs:

if os.path.exists("task_execution.log"):
    archive_name = f"task_execution_archive_{datetime.datetime.now().strftime('%Y%m%d_%H%M%S')}.log"
    os.rename("task_execution.log", archive_name)
    logger.info(f"[LOG_ARCHIVED] Previous log moved to {archive_name}")

1. The Rules File (tasks_template_rules.md)

This file contains the codified principles we've developed. It should be saved and provided to Claude as context.

# Rules for Creating Robust, Verifiable Task Lists

## Preamble
You are an expert AI workflow designer. When rewriting a task list according to these rules, your goal is to produce a plan that is unambiguous, deterministic, auditable, and resilient. Each instruction must be a direct, procedural command.

---

### Rule 1: Standard File Structure
Every task list file MUST begin with the following standard headers. Populate them based on the task's context.

-   `# [Title of the Workflow]`
-   `## Objective`: A one-sentence summary of the goal.
-   `## Working Directory`: The absolute path where all commands should be run.
-   `## Generated Files`: A bulleted list of all files that will be created during execution.
-   `## Tasks`: The main body of the workflow.

### Rule 2: Atomic and Procedural Tasks
Break down every high-level goal into its smallest, most explicit, and purely procedural steps.
-   **Bad (Vague):** "Process the file."
-   **Good (Procedural):**
    -   "Create a script named `process_data.py`."
    -   "Add a function to the script to read `input.csv`."
    -   "Execute the script: `python process_data.py`."
    -   "Verify the output file `results.txt` exists."

### Rule 3: Hybrid Command Strategy
Use the `commands/` directory for helper instructions, employing a hybrid strategy:
-   **For High-Complexity or Evolving Tasks (Declarative Prompts):** Instruct the AI to *generate a new script* based on a detailed prompt template (e.g., `commands/claude-verify.md`). This provides flexibility.
-   **For Common, Stable Tasks (Declarative Code):** Instruct the AI to *create a script by copying the full code* from a helper file and substituting variables (e.g., `commands/claude-poll.md`). This provides speed, reliability, and security.

### Rule 4: The Core Verification Pattern
For any task identified as high-risk, complex, or foundational, you MUST apply the full `Do -> Verify -> Poll -> Proceed` pattern. Structure this as three explicit sub-tasks:

1.  **Launch Background Verification:**
    -   Generate a verification script from a `claude-verify.md`-style helper.
    -   Launch this script as a non-blocking background process.

2.  **Wait for Completion:**
    -   Generate a polling script from a `claude-poll.md`-style helper.
    -   Execute this script and block until it completes.

3.  **Check Exit Code and Proceed Conditionally:**
    -   After the poll script finishes, add a mandatory step to check its exit code.
    -   Use the exit code to determine the next action: `If the exit code is 0, proceed... If the exit code is not 0, stop all further tasks...`.

### Rule 5: Selective Verification
Apply Rule 4 strategically. Do not verify every step.
-   **Apply to:** Code generation, critical data transformations, API calls with important results, steps that are prerequisites for many others.
-   **Skip for:** Simple file I/O (e.g., writing a known string), simple knowledge lookups (like answering a question), or steps where failure is easily caught by the final audit.

### Rule 6: Mandatory and Structured Logging
Every single action, no matter how small, MUST be followed by a `Log:` instruction.
-   Logs MUST have a structured prefix indicating their context (e.g., `[TASK1]`, `[VERIFY]`, `[POLL_ATTEMPT]`, `[EXECUTION_HALTED]`).
-   This creates a complete, machine-parseable audit trail.

### Rule 7: Final Holistic Audit
Every task list MUST conclude with a final validation task.
-   This task's purpose is to summarize the entire execution into a single report (`execution_summary.txt`).
-   It should then simulate (or actually call) an external validator (like Gemini) to review this summary and the logs for overall coherence and success.

### Rule 8: No Ambiguity
-   All file paths should be absolute or clearly relative to the specified `Working Directory`.
-   Avoid descriptive or conversational text within the task steps. Be a commander, not a commentator.

2. The Prompt Template (For the User)

This is the template a user would employ to transform their basic ideas into a robust plan.

You are an expert AI workflow designer specialized in creating detailed, robust, and verifiable execution plans.

Your task is to take a basic, high-level list of tasks and rewrite it into a formal task list that complies with a strict set of architectural rules.

The rules are defined in the file `tasks_template_rules.md`, which I have provided below.

<RULES_FILE>
---
# Rules for Creating Robust, Verifiable Task Lists

[... Paste the full content of tasks_template_rules.md here ...]
---
</RULES_FILE>

Now, analyze the following basic task list and rewrite it to fully comply with all the rules you have been given.

<BASIC_TASK_LIST>
---
[... User pastes their simple, high-level task list here ...]
---
</BASIC_TASK_LIST>

**Your Process:**

1.  First, thoroughly study the principles in `<RULES_FILE>`.
2.  Next, analyze the user's high-level goals in `<BASIC_TASK_LIST>`.
3.  Rewrite the basic list into a new, detailed task list that follows every rule.
4.  Identify the most complex or critical step and apply the full `Do -> Verify -> Poll -> Proceed` pattern (Rule 4).
5.  For simpler steps, ensure they are procedural and have structured logging, but skip the inline verification (Rule 5).
6.  Ensure the plan concludes with a final holistic audit task (Rule 7).

Your final output should be a single, complete, unabridged markdown file containing the new, compliant task list. Do not add any conversational text before or after the markdown.

Example Usage

A user wants to automate a data processing task. They would fill out the prompt template like this:

User's Filled-Out Prompt:

You are an expert AI workflow designer...

<RULES_FILE>
---
# Rules for Creating Robust, Verifiable Task Lists

[... Full content of tasks_template_rules.md ...]
---
</RULES_FILE>

Now, analyze the following basic task list and rewrite it to fully comply with all the rules you have been given.

<BASIC_TASK_LIST>
---
1. Download a user data CSV from 'https://example.com/users.csv'.
2. Write a Python script to calculate the average age from the 'age' column.
3. Save the result to a file called 'average_age.txt'.
---
</BASIC_TASK_LIST>

**Your Process:**
...

Expected Claude Output: Claude would then generate a complete, robust markdown file that looks very much like your final claude_poll_poc task list, but for this new CSV processing task. It would identify step #2 (writing the Python script) as the high-risk action and wrap it in the full verification pattern, while treating the download and final save steps as simpler actions requiring only logging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment