Every Monday, the team meets to review results from weekend compete runs -- CLI, Claude Code, and Codex CLI evaluated across Sonnet 4.6, Opus 4.7, and GPT 5.4. Today, this meeting is largely passive reporting: scores go up, scores go down, the team discusses possible reasons, and then disperses to work on whatever feels most promising.
What's missing is a closed loop:
Weekend Runs --> Identify Signals --> Select Investments --> Execute --> Validate