Dashboard Studio: Drag-and-Drop Widget Authoring

Five minutes from empty canvas to a production-grade SLO dashboard. We rebuilt the authoring surface around the pattern teams actually want, pick a template, swap data sources, ship it.

Why we rebuilt it

The old dashboard editor had a form for every property, title, query, axis, threshold, colour. It worked. It also took 20 minutes to produce a dashboard nobody loved, and the second one always copy-pasted the first. Teams ended up maintaining one good dashboard and forty cargo-cult clones nobody trusted.

The honest read: dashboard authoring isn't a config problem, it's a layout problem. The hard part is deciding what goes where and how panels relate to each other, not whether an axis is logarithmic. So we threw out the property forms and rebuilt the surface around drag-and-drop.

The new Dashboard Studio gives you a snapping grid, a widget palette on the left, and a live data preview as you drop. You're authoring with real metrics from second one, no save-and-reload loop to find out the panel is empty.

Templates that bootstrap

Every dashboard starts with a template. We ship 14 of them, RED dashboard, USE dashboard, SLO burn-down, deployment cadence, error budget, on-call load, and each one assumes nothing about your service registry. You pick a template, Nova binds it to a service, and the panels populate from your live telemetry.

Behind the scenes, templates are JSON manifests with placeholder bindings. The binding step resolves {{service}} against your registry and rewrites every query in the manifest. You see a working dashboard in about three seconds; you spend the rest of your five minutes tuning thresholds and re-ordering panels.

Templates are versioned. When we ship a v2 of the SLO template, dashboards built from v1 don't auto-migrate, they get a banner saying "v2 available" with a diff view. You upgrade when you're ready, not when we decide.

The widget library

The palette has 22 widgets. Nine of them are charts, line, area, bar, stacked bar, heatmap, hex-bin, sparkline, gauge, and the new burn-rate widget. The other 13 are non-chart panels, markdown, runbook embed, incident list, deploy timeline, SLO summary, agent activity, log tail, trace flame, and a handful more.

Each widget knows three things: the data shape it accepts, the controls it exposes, and how it should resize. The grid uses 12 columns; widgets snap to multiples; the layout reflows on small screens without you laying it out twice. We tested this against the 80 most-used dashboards from our beta tenants, every one of them looked sane on a 13-inch laptop.

The widget that surprised us most in beta was "agent activity", a live feed of which Nova agents are touching the service the dashboard belongs to. Engineers started pinning it to incident dashboards because it answered "is Nova still trying things?" without flipping tabs. We've since made it a default panel on the incident template.

The workspace pattern

Once you cross 30 dashboards, navigation becomes the problem. We added Workspaces, folders with permissions, tags, and a default landing dashboard. A workspace can be public to the whole org or scoped to a team. Dashboards inside inherit the workspace's data source defaults so you stop pasting the same Prometheus URL into every panel.

Workspaces also handle ownership. Every dashboard has an owner; when the owner leaves the org, the workspace owner inherits. Orphaned dashboards used to be the silent failure mode of every observability tool we'd used, three years in, half the dashboards on the wall belong to people who don't work there anymore. Workspaces fix that with one rule: no orphans.

Past 100 dashboards

The pattern that emerged from beta tenants who crossed 100 dashboards: a small number of "wall" dashboards that everyone watches, a long tail of personal scratchpads, and a middle layer of team-owned service dashboards. The studio surfaces all three differently, wall dashboards live in a shared workspace, scratchpads live under your user, team dashboards live in team workspaces.

Search across all of them is keyword + tag, and it's fast because we index dashboard manifests at write time, not at query time. Median search latency in production is 38ms across a tenant with 412 dashboards. The slowest tenant we've seen hits 71ms at p95 with 1,800 dashboards.

One last detail. Every dashboard has a "duplicate to scratchpad" button. We watched too many engineers ruin a wall dashboard by experimenting in production, now they fork to scratchpad with one click, edit freely, and propose changes back via a diff PR. It's the smallest feature in the release and the one we use most.