Reliability Engineering

Reliability in one screenshot,
budget, MTTR, and trend together

Reliability Snapshot is the page you screenshot for the weekly review. SLO burn rate per service, error budget remaining per service, MTTR by tier, incident count by week, and a one-line recommendation per service. Everything that matters this week, on one screen.

Get Started Talk to Sales
app.novaaiops.com / reliability-snapshot
● LIVE
4
KPI tiles
Per-svc
budget remaining
1
recommendation per service
Weekly
snapshot for review
KPI Tiles

Four numbers leadership reads first

Four tiles at the top: SLOs at risk this week, average error budget remaining, this-week MTTR, this-week incident count. Each tile has a delta vs the prior week so improvement (or regression) is visible without context. Click any tile to drill into the underlying breakdown.

  • SLOs at risk: count of SLOs with fast-burn alerts firing this week, the first number leadership wants
  • Budget remaining: average across all SLOs, a single trend line for "are we running out of room?"
  • MTTR: this-week mean time to resolution; deltas calibrated to the per-tier targets
  • Incident count: plain count, severity-weighted; clicks into the heatmap for the week
app.novaaiops.com / reliability-snapshot · kpi
Per-Service Recommendation

One concrete next action per service

Below the tiles, each service shows a one-line recommendation: tighten this SLO target, add capacity here, run a postmortem on that week's spike, retrain the prediction model for that service. The recommendations are generated from the underlying signals; they are concrete, not generic.

  • Concrete next action: not "improve reliability", specific, actionable, single-step
  • Linked to the data: every recommendation has a "why we suggest this" link with the supporting numbers
  • Dismissible: recommendations you decline get logged; the system stops re-suggesting the same thing
app.novaaiops.com / reliability-snapshot · recommendation
Weekly Snapshot

Frozen Friday at 5pm

The snapshot freezes every Friday at 5pm local time. The frozen snapshot is what shows up in the Monday review meeting and what gets emailed to leadership. Live numbers are still visible on the page, but the meeting numbers are stable so a Saturday incident does not retroactively change the slide.

  • Friday 5pm freeze: frozen snapshot stable through the weekend; live numbers still visible separately
  • Email digest: Monday morning email to a configurable list, defaults to platform-admin
  • Historical archive: every weekly snapshot is archived so you can compare any two weeks side by side
app.novaaiops.com / reliability-snapshot · archive
Permissions

Reliable views for non-engineers

Leadership and product partners often want the snapshot but should not see raw signals or run actions from it. The page has a read-only "executive view" with the snapshot, deltas, and recommendations, but no drill-ins to incidents or runbooks. One link, two audiences.

  • Executive view: snapshot + deltas + recommendations; no operational drill-ins
  • Engineer view: full snapshot plus drill-ins to incidents, runbooks, and historical archives
  • Embed-friendly: executive view is embed-safe so it can live in your wiki or BI tool
app.novaaiops.com / reliability-snapshot · roles
Video walkthrough coming soon

Subscribe to Nova AI Ops on YouTube for demos, tutorials, and feature deep-dives.

Your weekly engineering review, pre-built

Stop assembling slides on Friday. The snapshot is the slide.

Get Started Request a Demo