Reliability Engineering

Define what reliable means,
and let Nova hold every service to it

SLO Management is where you write down what reliable looks like for each service. SLI definitions, SLO targets, burn-rate alert rules, error budget policies. Once defined, every service in Service Health Matrix tracks against your numbers, and the agents respect them when deciding whether to act.

Get Started Talk to Sales
app.novaaiops.com / slo-management
● LIVE
5
Default SLO templates
YAML
or UI definition
Multi-window
burn rate alerts
Versioned
every change
SLI Definition

Pick a signal, write a query

An SLI (Service Level Indicator) is the measurable thing. Latency, availability, error rate, freshness, anything you can express as a query against your signals. Nova ships templates for the common ones and a free-form mode for custom indicators. Every SLI is testable in a sandbox before you commit it.

  • Five built-in SLI types: latency_quantile, availability_ratio, error_ratio, saturation, data_freshness, works on day one
  • Custom SLIs in NovaQL: write any query that returns a number per minute and Nova will track it as an SLI
  • Sandbox runs before commit: see what the SLI would have looked like over the past 30 days before you set the target
app.novaaiops.com / slo-management · sli
SLO Targets & Windows

Pick the number, pick the window

A target ("p95 under 200ms") plus a window ("30 days, rolling") makes an SLO. Nova supports rolling windows (last 30 days) and calendar windows (this month, this quarter). Calendar windows reset; rolling windows do not. Pick the one that matches your reporting cadence.

  • Rolling vs calendar: rolling = always last N days, calendar = this month / quarter / year, pick per SLO
  • Composite SLOs: an SLO can require multiple SLIs ("p95 under 200ms AND error rate under 0.1%") for stricter governance
  • Per-tier defaults: tier-0 services start at 99.9%, tier-1 at 99.5%, tier-2 at 99%, override per service as needed
app.novaaiops.com / slo-management · target
Burn-Rate Alerts

Two windows, two alerts, no flapping

Nova uses the multi-window multi-burn-rate pattern from the Google SRE workbook. A short window with a high burn (6h × 2x) pages on-call when something is acutely wrong. A long window with a lower burn (24h × 1x) notifies the team when something is slowly draining. Two alerts, no false alarms.

  • Fast-burn alert: 6h window × 2x burn rate threshold → page on-call (acute)
  • Slow-burn alert: 24h window × 1x burn rate threshold → notify team channel (drift)
  • Tunable per SLO: override windows and ratios for SLOs where the defaults do not fit your traffic shape
app.novaaiops.com / slo-management · alerts
Versioning & Review

SLOs are code, reviewed and versioned

Every SLO change creates a new version with a diff, an author, a reason, and an optional reviewer. Tighten a target by accident? Roll it back to the prior version with one click. Export the whole library to YAML for IaC. Import from YAML for GitOps. SLOs that survive review are SLOs that survive an audit.

  • Diff + author + reason: every change shows what was changed, by whom, when, and why, visible from the SLO detail page
  • YAML export and import: GitOps-friendly: store SLOs as YAML in your repo, sync them to Nova on merge
  • Reviewer gate (optional): require a reviewer for tier-0 SLO changes so a junior cannot loosen a critical target alone
app.novaaiops.com / slo-management · history
Video walkthrough coming soon

Subscribe to Nova AI Ops on YouTube for demos, tutorials, and feature deep-dives.

Reliability you can govern, not just dashboard

SLOs in Nova are first-class objects. Versioned, reviewable, and enforced by the agents and the alert pipeline.

Get Started Request a Demo