Skip to content

Instantly share code, notes, and snippets.

@wilmoore
Last active September 19, 2025 07:58
Show Gist options
  • Select an option

  • Save wilmoore/22ad667f6ecee1ec0d9527e42bc48ae1 to your computer and use it in GitHub Desktop.

Select an option

Save wilmoore/22ad667f6ecee1ec0d9527e42bc48ae1 to your computer and use it in GitHub Desktop.
Business :: Ideas :: Agent-UI :: Prompt

Business :: Ideas :: Agent-UI :: Prompt

⪼ Made with 💜 by Polyglot.

Here’s the fully revised build spec with technology choices removed. It reads like a functional blueprint: what the system must do, how components interact, and how replays fit in—while leaving all implementation details (frameworks, languages, databases) up to the builder.

## Product Overview

An interactive web application for designing and debugging Mastra-style pipelines. Users can:

* Paste or describe a pipeline and see it visualized as a graph.
* Run the pipeline with sample inputs and watch live logs and metrics.
* Replay or rerun past executions to inspect or verify behavior.

Primary users: builders (design and debug) and reviewers (inspect runs).

---

## Core Architecture (conceptual)

* **Single deployable service** providing a web UI, API, and live event streaming.
* **Persistent store** for pipelines, runs, and run events.
* **In-process runner** to execute pipelines inside a controlled sandbox.
* **Real-time channel** for sending structured run events (logs, spans, status) to the UI.

No technology stack is mandated. The implementer selects language, framework, storage, and deployment environment.

---

## UI Specification

### Layout

* **Left panel**: live logs with filters (level, node, text search).
* **Middle panel**: two tabs

  * **Code tab** – editable code view with validate, apply, and run controls.
  * **Preview tab** – dynamic graph of nodes and edges, with a timeline of spans for the latest run.
* **Right panel**: metrics and, when present, evaluation (eval) results.

### Key interactions

* Paste or edit pipeline code, validate it, and preview the resulting graph.
* Describe a pipeline in natural language; receive a generated code diff for approval.
* Start a run and see logs and metrics stream live.
* Select any past run to replay or rerun.

---

## Runs and Replays

### Normal run

* User starts a run.
* The system captures the exact code snapshot, input payload, and parsed graph.
* Execution emits structured events—status, spans, logs—which are streamed to the UI and stored for later replay or analysis.

### Replay

* User selects a past run and chooses **Replay**.
* The system streams the saved events back to the UI, in original or accelerated timing.
* The graph and logs animate exactly as during the original run.
* No re-execution occurs.

### Rerun

* User selects **Rerun**.
* The system takes the saved code snapshot and input and performs a fresh execution.
* New events are streamed and stored, making it possible to compare with the original run.

Replays are instantaneous and cost-free; reruns verify determinism and current correctness.

---

## Backend Behavior

* **Pipeline ingestion** – validate incoming code or generated descriptions, parse to a node-edge graph.
* **Run engine** – execute each node in sequence or in parallel as defined, emitting span and log events.
* **Replay engine** – read stored events and re-emit them over the live channel for accurate playback.
* **Metrics calculator** – compute run-level metrics such as latency, token usage, and success rate.
* **Voice/description input** – convert a natural language request into a validated pipeline description and present a diff for approval.

---

## Event Model

Each run, including replays and reruns, emits a unified stream of events:

* **run\_status** – overall state changes.
* **span\_start / span\_end** – begin and complete processing for each node.
* **log** – structured log messages tied to nodes.

Events include timestamps, ordering identifiers, and any relevant metadata to support live display and faithful replay.

---

## Security and Safety

* Sandbox execution so untrusted code cannot access system resources or unauthorized networks.
* Optional user authentication and roles (e.g., owner, editor, viewer).
* Audit trail of sensitive actions such as applying or running a pipeline.

---

## Deployment

* Delivered as one service with built-in web UI, API endpoints, and real-time event channel.
* Implementation team is free to choose hosting environment and deployment strategy.

---

## Acceptance Criteria

* Paste code and see an accurate graph almost immediately.
* Start a run and watch logs and metrics stream in real time.
* Replay any past run with correct timing and visual fidelity.
* Optionally rerun and compare results to detect drift.
* Describe a pipeline by voice or text and apply the generated diff.

---

## Delivery Roadmap

1. **Core editor & preview** – code editing, validation, and graph rendering.
2. **Runner & streaming** – live execution and real-time event delivery.
3. **Replays & reruns** – event capture, playback, and rerun support in the UI.
4. **Voice or text pipeline generation** – natural language to code diff and apply flow.

image

Waterfall

The “Waterfall” area is a time-sequence view of how the pipeline’s nodes actually ran. Think of it like the network tab in a browser dev-tools panel, but for your agent pipeline.

Here’s how it works:

* **Purpose** – it lets you see the critical path of a run. Each bar represents a node’s start and finish time. You can instantly spot which steps overlapped, which blocked the flow, and where delays occurred.

* **Live vs. replay** – during a live run, the bars grow in real time as each node starts and ends. When you replay a past run, the same bars animate exactly as they did originally, so you can watch the pipeline “re-run” without actually executing anything.

* **Debugging value** – if a pipeline slows down or fails, the waterfall helps you pinpoint whether the bottleneck was a single long-running node, an external call, or a chain of dependencies.

In short, the Waterfall panel is a visual timeline of node execution, giving you a quick, intuitive way to understand run order, overlap, and performance bottlenecks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment