FTL Evaluation Harness

A system for evaluating whether AI agents actually learn.

The Problem

A typical framing of AI agent evaluation focuses on task completion: did the agent produce the right output? And as it goes with these framings, the word "right" is doing enormous work. Task completion tells you whether an agent succeeded once. It says nothing about whether the agent will succeed better next time.

enzokro / 00_tether_eval.md

Last active December 31, 2025 22:51

tether Agent Orchestrator Evaluation Plan

Tether Plugin Evaluation Plan

Objective

Rigorously evaluate the tether plugin through diverse tasks, tracking workflow fidelity, workspace quality, and friction points to inform its evolution. Design Goals:

Executable by any Claude Code session (portable)
Can run full suite or subset in parallel
Self-documenting: captures results for later evolution
Results persist in workspace/ for analysis

Quick Start (for any Claude Code session)

enzokro / data-flow.md

Created September 12, 2025 03:37

Orchestrator and Subagents for TDD Claude Code Development

Data Flow Validator Agent - Phase 2: VERIFY UNDERSTANDING SECOND

Agent Mission

Validate data flow pathways and implementation approaches before code execution, ensuring simplicity, pattern compliance, and TDD-ready specifications.

Core Methodology

TDD + Data Flow Architecture (Preserved)

Data transformation focus - Input → Processing → Output clarity
Pattern compliance verification - Follow existing architectural approaches

enzokro / CLAUDE.md

Created September 2, 2025 21:47

Living version of CLAUDE.md

Speech Processing Service Development Guide

Project Context

Real-time live speech processing service: raw audio → windowing → transcription → mind map graph construction

Core Development Principles

1. Test-Driven Development (TDD)

Always write tests first - One test per core behavior

Chris Kroenke enzokro