The Agent Run Timeline: Building a Replay UI

A timeline you can scrub. The web component, the data model, and the keyboard shortcuts that turn an opaque run into something a junior SRE can debug.

The UI shape

The replay UI is a horizontal timeline. Each step is a rectangle sized by duration and coloured by type (model call, tool call, decision); click a step to see the full payload, right-arrow to jump to next step, slash key to search across steps; persisted state in URL (run_id and focused step) so sharing a link shares a specific moment in the run.

Horizontal timeline. Each step as a rectangle; sized by duration; coloured by type.
Click for payload. Full step detail revealed on click; the inspection surface.
Right-arrow and slash navigation. Next step, search; the keyboard primitives.
URL state persistence. run_id and focused step; share specific moments via link.

Data model

The data model is a flat array. Steps stored with start_ms, duration_ms, type, payload; the flat array sorts and renders trivially. Decisions are first-class steps (“decided to call tool X with args Y” is a step the operator can scrub to); errors highlighted with a red bar so tool failures, hallucination flags, and refusals surface visually.

Flat array of steps. start_ms, duration_ms, type, payload; sorts and renders trivially.
Decisions as steps. “Decided to call X with Y” is a first-class step the operator can scrub to.
Red-bar errors. Tool failures, hallucination flags, refusals; surfaced visually.
Per-step payload. Full context per step; supports deep investigation.

Performance considerations

Performance matters for long runs. Hundreds of steps need virtualisation (render only the visible region, compute the rest on demand); payload bodies can be large so lazy-load them and the timeline list stays fast even when individual payloads are heavy; cache aggressively because replay is read-only on a finished run.

Virtualise long runs. Render only the visible region; the rest is computed on demand.
Lazy-load payloads. Bodies can be large; timeline list stays fast.
Aggressive caching. Read-only on finished run; cache the entire timeline structure.
Per-run cache invalidation. Run id is the cache key; supports correct refresh.

Keyboard shortcuts

Treat the UI like a developer tool. Right/left arrow for next/previous step, shift+arrow to jump 10 steps, home/end for first/last step; slash to focus search, enter on result to jump to that step; esc to close detail panel, question mark to show shortcut reference.

Arrow navigation. Right/left for next/previous; shift+arrow jumps 10; home/end for first/last.
Slash for search. Focus search; enter on result jumps to step.
Esc closes detail. The standard escape; consistent with developer tools.
Question mark for help. Shortcut reference visible; supports learnability.

Export and share

Three export formats cover the main needs. Export as JSON (whole run as downloadable file, useful for offline analysis and bug reports); export as Markdown (narrative version with each step’s key info, useful for postmortem writeups); direct link (URL that opens the timeline at a specific step, useful for chat conversations about specific moments).

JSON export. Whole run downloadable; offline analysis and bug reports.
Markdown export. Narrative version with key info per step; postmortem writeups.
Direct link. URL opens timeline at specific step; chat conversations.
Per-format intended use. Each format optimised for a specific consumer; supports the right surface.