Business :: Ideas :: Agent-UI :: Prompt
⪼ Made with 💜 by Polyglot.
Here’s the fully revised build spec with technology choices removed. It reads like a functional blueprint: what the system must do, how components interact, and how replays fit in—while leaving all implementation details (frameworks, languages, databases) up to the builder.
## Product Overview
An interactive web application for designing and debugging Mastra-style pipelines. Users can:
* Paste or describe a pipeline and see it visualized as a graph.
* Run the pipeline with sample inputs and watch live logs and metrics.
* Replay or rerun past executions to inspect or verify behavior.
Primary users: builders (design and debug) and reviewers (inspect runs).
---
## Core Architecture (conceptual)
* **Single deployable service** providing a web UI, API, and live event streaming.
* **Persistent store** for pipelines, runs, and run events.
* **In-process runner** to execute pipelines inside a controlled sandbox.
* **Real-time channel** for sending structured run events (logs, spans, status) to the UI.
No technology stack is mandated. The implementer selects language, framework, storage, and deployment environment.
---
## UI Specification
### Layout
* **Left panel**: live logs with filters (level, node, text search).
* **Middle panel**: two tabs
* **Code tab** – editable code view with validate, apply, and run controls.
* **Preview tab** – dynamic graph of nodes and edges, with a timeline of spans for the latest run.
* **Right panel**: metrics and, when present, evaluation (eval) results.
### Key interactions
* Paste or edit pipeline code, validate it, and preview the resulting graph.
* Describe a pipeline in natural language; receive a generated code diff for approval.
* Start a run and see logs and metrics stream live.
* Select any past run to replay or rerun.
---
## Runs and Replays
### Normal run
* User starts a run.
* The system captures the exact code snapshot, input payload, and parsed graph.
* Execution emits structured events—status, spans, logs—which are streamed to the UI and stored for later replay or analysis.
### Replay
* User selects a past run and chooses **Replay**.
* The system streams the saved events back to the UI, in original or accelerated timing.
* The graph and logs animate exactly as during the original run.
* No re-execution occurs.
### Rerun
* User selects **Rerun**.
* The system takes the saved code snapshot and input and performs a fresh execution.
* New events are streamed and stored, making it possible to compare with the original run.
Replays are instantaneous and cost-free; reruns verify determinism and current correctness.
---
## Backend Behavior
* **Pipeline ingestion** – validate incoming code or generated descriptions, parse to a node-edge graph.
* **Run engine** – execute each node in sequence or in parallel as defined, emitting span and log events.
* **Replay engine** – read stored events and re-emit them over the live channel for accurate playback.
* **Metrics calculator** – compute run-level metrics such as latency, token usage, and success rate.
* **Voice/description input** – convert a natural language request into a validated pipeline description and present a diff for approval.
---
## Event Model
Each run, including replays and reruns, emits a unified stream of events:
* **run\_status** – overall state changes.
* **span\_start / span\_end** – begin and complete processing for each node.
* **log** – structured log messages tied to nodes.
Events include timestamps, ordering identifiers, and any relevant metadata to support live display and faithful replay.
---
## Security and Safety
* Sandbox execution so untrusted code cannot access system resources or unauthorized networks.
* Optional user authentication and roles (e.g., owner, editor, viewer).
* Audit trail of sensitive actions such as applying or running a pipeline.
---
## Deployment
* Delivered as one service with built-in web UI, API endpoints, and real-time event channel.
* Implementation team is free to choose hosting environment and deployment strategy.
---
## Acceptance Criteria
* Paste code and see an accurate graph almost immediately.
* Start a run and watch logs and metrics stream in real time.
* Replay any past run with correct timing and visual fidelity.
* Optionally rerun and compare results to detect drift.
* Describe a pipeline by voice or text and apply the generated diff.
---
## Delivery Roadmap
1. **Core editor & preview** – code editing, validation, and graph rendering.
2. **Runner & streaming** – live execution and real-time event delivery.
3. **Replays & reruns** – event capture, playback, and rerun support in the UI.
4. **Voice or text pipeline generation** – natural language to code diff and apply flow.
The “Waterfall” area is a time-sequence view of how the pipeline’s nodes actually ran. Think of it like the network tab in a browser dev-tools panel, but for your agent pipeline.
Here’s how it works:
* **Purpose** – it lets you see the critical path of a run. Each bar represents a node’s start and finish time. You can instantly spot which steps overlapped, which blocked the flow, and where delays occurred.
* **Live vs. replay** – during a live run, the bars grow in real time as each node starts and ends. When you replay a past run, the same bars animate exactly as they did originally, so you can watch the pipeline “re-run” without actually executing anything.
* **Debugging value** – if a pipeline slows down or fails, the waterfall helps you pinpoint whether the bottleneck was a single long-running node, an external call, or a chain of dependencies.
In short, the Waterfall panel is a visual timeline of node execution, giving you a quick, intuitive way to understand run order, overlap, and performance bottlenecks.
