Wiki Automation Architecture

Type: System Design
Date: 2026-04-17
Status: Design Phase
Related: [[research-queue|Research Queue]], [[IRIS|Wiki Schema]]

The Question

How should Iris automate wiki operations (transcript ingestion, consolidation, extraction) in a way that's:

Portable (works locally for testing, deploys to server)
Visible (track tokens, tool calls, operations)
Flexible (handle both known patterns and ad-hoc tasks)
Maintainable (version controlled, discoverable)

The Architecture

Three Layers

1. Skills (Portable Automation)

Built on agentskills.io spec (Iris already supports this!)
Installed via npx skills add
Discoverable, versioned, shareable
Skills invoke PHP scripts (not bash)

2. PHP Scripts (Laravel-Powered)

Bootstrap Laravel for full framework access
Live in scripts/ directory (git-tracked)
Use Http, Storage, DB, Cache, Prism
Portable across environments

3. Cron Scheduling

Laravel Task Scheduler for recurring tasks
Potentially skill-based dynamic scheduling
Visible in code, version controlled

PHP Scripts Pattern

Instead of bash, use PHP that bootstraps Laravel:

#!/usr/bin/env php
<?php

require __DIR__.'/../vendor/autoload.php';
$app = require_once __DIR__.'/../bootstrap/app.php';
$app->make(Kernel::class)->bootstrap();

use Illuminate\Support\Facades\Http;
use Illuminate\Support\Facades\Storage;

// Full Laravel + Prism power available

$transcript = Http::get("https://api.podcast.com/transcript/40")->body();

Storage::disk('local')->put(
    'wiki/raw/transcripts/episode-40.md',
    $transcript
);

// Use Prism to extract strategies
$strategies = Prism::text()
    ->using(Provider::Anthropic, 'claude-3-5-sonnet-20241022')
    ->withPrompt("Extract LLM strategies...")
    ->generate($transcript);

// Update wiki
file_put_contents(
    __DIR__.'/../wiki/wiki/Research/LLM-Strategies/from-podcast-40.md',
    $strategies
);

Why this is powerful:

Full Laravel ecosystem (Http, Storage, DB, Cache)
Prism for LLM calls (critical for wiki intelligence)
Portable (same code works local + server)
Git-tracked and version controlled
Easy to review (TJ knows Laravel/PHP)

Directory Structure

~/wiki/
├── IRIS.md
├── index.md
├── log.md
├── scripts/                    # PHP automation scripts
│   ├── ingest-podcast-batch.php
│   ├── extract-strategies.php
│   └── consolidate-memories.php
├── raw/
├── wiki/
└── .agents/
    └── skills/                 # Installed agentskills
        ├── podcast-processor/
        └── wiki-consolidator/

Skills = Production automation (tested, versioned, reusable)
Scripts = Development/exploration (iterate fast, promote when proven)
Git = Deployment mechanism (commit → push → works on server)

Visibility & Tracking

Core principle: Iris is built on visibility. Wiki operations should be fully transparent.

Key insight: Iris already has token tracking via TokenMeter service - just use it!

Use Existing Token Tracking

Iris has a token_usages table and TokenMeter service that tracks all LLM usage. Wiki scripts should use this instead of building a parallel system.

Proposal: Make TokenMeter generic

Instead of requiring a new method for every source type, add a generic record() method:

// In App\Services\TokenMeter
public function record(
    int $userId,
    string $sourceType,  // Any string: 'wiki-ingest', 'wiki-consolidate', etc.
    TokenUsageData $usage,
    string $model,
    string $provider,
    ?int $conversationId = null
): TokenUsage {
    return TokenUsage::create([
        'user_id' => $userId,
        'conversation_id' => $conversationId,
        'source_type' => $sourceType,
        'model' => $model,
        'provider' => $provider,
        'prompt_tokens' => $usage->promptTokens,
        'completion_tokens' => $usage->completionTokens,
        'cache_write_tokens' => $usage->cacheWriteTokens,
        'cache_read_tokens' => $usage->cacheReadTokens,
        'thought_tokens' => $usage->thoughtTokens,
    ]);
}

Script Pattern with Token Tracking

#!/usr/bin/env php
<?php

require __DIR__.'/../vendor/autoload.php';
$app = require_once __DIR__.'/../bootstrap/app.php';
$app->make(Kernel::class)->bootstrap();

use App\Services\TokenMeter;
use App\ValueObjects\TokenUsageData;

$tokenMeter = app(TokenMeter::class);
$startTime = now();

try {
    // Do the work...
    $result = Prism::text()
        ->using('anthropic', 'claude-sonnet-4-5')
        ->withMessages(wikiPrompts())
        ->withPrompt("Extract strategies from podcast transcript...")
        ->generate($transcript);
    
    // Record token usage via existing system
    $tokenMeter->record(
        userId: auth()->id(),
        sourceType: 'wiki-ingest',  // Custom source type string
        usage: new TokenUsageData(
            promptTokens: $result->usage()->inputTokens,
            completionTokens: $result->usage()->outputTokens,
            cacheReadTokens: $result->usage()->cacheReadTokens ?? 0,
            cacheWriteTokens: $result->usage()->cacheWriteTokens ?? 0,
        ),
        model: $result->model(),
        provider: $result->provider()
    );
    
    echo "✓ Processed in " . $startTime->diffInSeconds(now()) . "s\n";
    
} catch (\Exception $e) {
    echo "✗ Failed: {$e->getMessage()}\n";
    throw $e;
}

Optional: Operation Context Files (JSONL)

If you need more context than just tokens (episodes processed, what strategies were extracted, etc.), use JSONL (newline-delimited JSON) for append-only operation logs:

~/wiki/
├── operations.jsonl  # Single file, one operation per line

Each line is a complete JSON object:

{"timestamp":"2026-04-17T18:49:00Z","operation":"ingest","source":"podcast","context":{"episodes":"40-50"},"status":"success","duration_s":222}
{"timestamp":"2026-04-17T19:23:00Z","operation":"consolidate","status":"success","duration_s":45}

Benefits:

✅ Append-only (fast, concurrent-safe)
✅ Easy to parse (jq, grep, or PHP)
✅ Git-friendly (diffs show new lines added)
✅ No schema needed
✅ Optional - only if you need operation-level context beyond tokens

Script appends to JSONL:

$logEntry = json_encode([
    'timestamp' => now()->toIso8601String(),
    'operation' => 'ingest',
    'source' => 'podcast',
    'context' => ['episodes' => '40-50'],
    'status' => 'success',
    'duration_s' => $startTime->diffInSeconds(now()),
]);

file_put_contents(
    __DIR__.'/../wiki/operations.jsonl',
    $logEntry . "\n",
    FILE_APPEND
);

Wiki Log Integration

Also append to wiki/log.md:

## [2026-04-17 18:49] automated | Podcast Batch Ingest

Script: `scripts/ingest-podcast-batch.php`
Episodes: 40-50
Status: ✓ Success
Tokens: 45.2k input, 12.8k output (2.1k cached)
Tool calls: 11 HTTP requests
Duration: 3m 42s

Viewing Token Usage (UI Dashboard)

Query the existing token_usages table:

use App\Models\TokenUsage;

// Get wiki-related token usage
$wikiUsage = TokenUsage::forUser(auth()->id())
    ->where('source_type', 'LIKE', 'wiki-%')  // All wiki source types
    ->latest()
    ->take(50)
    ->get();

// Calculate stats
$totalTokens = $wikiUsage->sum(fn($usage) => $usage->totalTokens());
$totalCost = calculateCost($totalTokens);
$cacheHitRate = $wikiUsage->sum('cache_read_tokens') / $wikiUsage->sum('prompt_tokens');

// Group by source type
$bySource = $wikiUsage->groupBy('source_type')
    ->map(fn($group) => $group->sum(fn($u) => $u->totalTokens()));

return view('wiki.token-usage', compact('wikiUsage', 'totalTokens', 'totalCost', 'bySource'));

Dashboard shows:

Token usage by wiki operation type (ingest, consolidate, extract)
Cache hit rates
Cost over time
All using Iris's existing table

Optional JSONL for operation context: If you need richer context (what episodes, what failed, etc.), read the JSONL file:

$operations = collect(file(__DIR__.'/../wiki/operations.jsonl'))
    ->map(fn($line) => json_decode($line, true))
    ->sortByDesc('timestamp')
    ->take(50);

Cron Scheduling

Option 1: Laravel Task Scheduler (Clean)

// app/Console/Kernel.php
protected function schedule(Schedule $schedule)
{
    $schedule->exec('php scripts/process-new-transcripts.php')
        ->hourly()
        ->description('Check for new podcast episodes');
    
    $schedule->exec('php scripts/wiki-consolidation.php')
        ->dailyAt('23:00')
        ->description('Nightly memory consolidation');
}

Run via: php artisan schedule:work (or system cron)

Pros: Laravel-native, visible in code, version controlled
Cons: Requires modifying Kernel.php (not dynamic)

Option 2: Skill-Based Dynamic Scheduling

Create cron-manager skill that Iris can invoke:

schedule_task('daily', '23:00', 'php scripts/wiki-consolidation.php')

Skill writes to config file that scheduler loads.

Pros: Dynamic scheduling from conversations
Cons: More complex to build

Decision: Start with Option 1, add Option 2 if needed

Comparison to Chris's OpenClaw Setup

OpenClaw:

Discord channels per project
Obsidian vault with _context.md / _last-session.md
Skills from ClawHub (cron, automation, etc.)
Skills handle automation

Iris + Wiki:

Laravel-based with artisan commands
Wiki markdown files with IRIS.md schema
Skills from agentskills.io
PHP scripts + Skills = automation layer

Key similarity: Skills are the portable automation contract
Key difference: PHP/Laravel power instead of language-agnostic scripts

Workflow Example

Request: "Ingest all Slightly Caffeinated episodes 40-50 and extract LLM strategies"

Iris:

Check for podcast-processor skill → doesn't exist yet
Write scripts/ingest-podcast-batch.php (PHP, bootstraps Laravel)
Test locally: php scripts/ingest-podcast-batch.php 40 50
Works → commit to git → TJ reviews
TJ deploys → script available on server
Create wiki_operations record tracking tokens/tools
Log to wiki/log.md
Next request for episodes 51-60? Use same script

After 3rd similar request: "This pattern keeps coming up. Should we make a proper skill for podcast processing?"

Promote script → skill:

Create skill structure in .agents/skills/podcast-processor/
Skill invokes the PHP script
Now discoverable via agentskills.io spec

Benefits

✅ Portable - PHP/Laravel works everywhere
✅ Powerful - Full framework + Prism access
✅ Visible - DB tracking, wiki logging, UI dashboard
✅ Testable - TJ runs scripts locally before deploy
✅ Git-tracked - Version controlled, reviewable
✅ Discoverable - Skills layer provides catalog
✅ Maintainable - Laravel patterns, not one-off bash

Prompt Strategy for Wiki Scripts

Problem: SystemPromptBuilder includes ALL Iris prompts:

IrisStaticPrompt (persona + truth system + memory system + tool protocols)
MemoryPrompt (user's personal memories)
CalendarPrompt (user's calendar events)
SkillsPrompt (available skills)
CurrentTimePrompt (temporal context)
etc.

Wiki scripts only need:

✅ Iris persona (personality, communication style)
✅ Current timestamp (temporal context)
❌ NOT user memories (wiki operations aren't personal)
❌ NOT calendar (wiki doesn't need user's schedule)
❌ NOT tool protocols (scripts don't invoke tools interactively)

Solution: Break Apart IrisStaticPrompt

Current structure:

IrisStaticPrompt (one big blob):
  - Persona/personality
  - @include('prompts.truth-system')
  - @include('prompts.memory-system')
  - Tool invocation protocol
  - Tool response protocol

Proposed structure:

IrisBasePersonaPrompt (just personality)
TruthSystemPrompt (truth instructions)
MemorySystemPrompt (memory instructions)
ToolProtocolPrompt (tool usage protocols)

Wiki Script Usage

#!/usr/bin/env php
<?php

require __DIR__.'/../vendor/autoload.php';
$app = require_once __DIR__.'/../bootstrap/app.php';
$app->make(Kernel::class)->bootstrap();

use App\Prompts\IrisBasePersonaPrompt;
use App\Prompts\CurrentTimePrompt;

// Get just the prompts wiki scripts need
$systemMessages = [
    app(IrisBasePersonaPrompt::class)->asMessages()[0],  // Iris personality
    app(CurrentTimePrompt::class)->asMessages()[0],      // Timestamp
];

// Add task-specific instruction
$taskInstruction = "Extract all LLM strategies from this podcast transcript...";

$result = Prism::text()
    ->using('anthropic', 'claude-sonnet-4-5')
    ->withMessages($systemMessages)
    ->withPrompt($taskInstruction)
    ->generate($transcript);

Related: Prompt Cache Efficiency

This also solves [[technical-issues#prompt-cache-hygiene|Prompt Cache Hygiene]]!

Breaking apart IrisStaticPrompt into smaller, focused prompts allows for granular cache blocks:

'prompts' => [
    IrisBasePersonaPrompt::class,      // Cache block 1 (static)
    TruthSystemPrompt::class,          // Cache block 2 (static)
    MemorySystemPrompt::class,         // Cache block 3 (static)
    ToolProtocolPrompt::class,         // Cache block 4 (static)
    // --- dynamic prompts below ---
    MemoryPrompt::class,               // Dynamic (no cache)
    CalendarPrompt::class,             // Dynamic (no cache)
    CurrentTimePrompt::class,          // Dynamic (no cache)
],