Skip to content

Instantly share code, notes, and snippets.

View enzokro's full-sized avatar

Chris Kroenke enzokro

View GitHub Profile
@enzokro
enzokro / README.md
Created January 11, 2026 00:49
The ftl Eval Harness

FTL Evaluation Harness

A system for evaluating whether AI agents actually learn.


The Problem

A typical framing of AI agent evaluation focuses on task completion: did the agent produce the right output? And as it goes with these framings, the word "right" is doing enormous work. Task completion tells you whether an agent succeeded once. It says nothing about whether the agent will succeed better next time.

@enzokro
enzokro / 00_tether_eval.md
Last active December 31, 2025 22:51
tether Agent Orchestrator Evaluation Plan

Tether Plugin Evaluation Plan

Objective

Rigorously evaluate the tether plugin through diverse tasks, tracking workflow fidelity, workspace quality, and friction points to inform its evolution. Design Goals:

  • Executable by any Claude Code session (portable)
  • Can run full suite or subset in parallel
  • Self-documenting: captures results for later evolution
  • Results persist in workspace/ for analysis

Quick Start (for any Claude Code session)

@enzokro
enzokro / data-flow.md
Created September 12, 2025 03:37
Orchestrator and Subagents for TDD Claude Code Development

Data Flow Validator Agent - Phase 2: VERIFY UNDERSTANDING SECOND

Agent Mission

Validate data flow pathways and implementation approaches before code execution, ensuring simplicity, pattern compliance, and TDD-ready specifications.

Core Methodology

TDD + Data Flow Architecture (Preserved)

  • Data transformation focus - Input → Processing → Output clarity
  • Pattern compliance verification - Follow existing architectural approaches
@enzokro
enzokro / CLAUDE.md
Created September 2, 2025 21:47
Living version of CLAUDE.md

Speech Processing Service Development Guide

Project Context

Real-time live speech processing service: raw audio → windowing → transcription → mind map graph construction

Core Development Principles

1. Test-Driven Development (TDD)

  • Always write tests first - One test per core behavior