Service Reliability

Every service. Every SLO.
One view.

The Service Health Matrix gives your SRE team a single-pane view across all 96+ services, live SLO compliance, P50/P95/P99 latency, error rates, and expandable dependency rows. No more tab-hopping between dashboards when you're trying to find the blast radius of a degradation.

Start Free Trial Watch Demo
app.novaaiops.com · Service Health Matrix
● LIVE
Nova AI Service Health Matrix
96+
Services tracked
Real-time
SLO compliance
P50/P95/P99
Latency percentiles
Expandable
Dependency rows
Health Donut & KPI Strip

At-a-glance breakdown: healthy, degraded, critical across every service.

The top of every Service Health Matrix view opens with a live donut chart showing the ratio of healthy, degraded, and critical services across your entire fleet. Below it, a KPI strip surfaces the key numbers that matter right now, total SLO breaches, services burning error budget, and open incidents, so on-call engineers have full context in under three seconds.

  • Live health donut: green/amber/red breakdown updates in real-time as service status changes
  • KPI strip: active SLO violations, error budget burn rate, and open P1/P2 incidents surfaced at the top of every view
  • Click-through drill-down: click any segment to filter the full matrix to only degraded or critical services instantly
app.novaaiops.com · Health Overview
Service health donut and KPI strip
Latency Distribution

P50, P95, P99: per service, with historical trend and SLA breach alerting.

Averages lie. The Service Health Matrix tracks P50, P95, and P99 latency per service and compares them against your defined SLA thresholds in real-time. When P99 starts creeping toward your SLA ceiling, even before it breaches, Nova fires a warning alert so your team can investigate before customers feel the slowness.

  • Three-percentile tracking: P50, P95, and P99 columns per service with color-coded SLA proximity indicators
  • Historical trend comparison: compare this hour against the same hour last week to spot gradual regressions
  • Pre-breach alerting: configurable warning threshold at 80% of SLA ceiling so you act before customers are impacted
app.novaaiops.com · Latency Percentiles
P50/P95/P99 latency distribution
Error Rate Comparison

Spot error outliers and cascading failures across all services simultaneously.

When a database connection pool exhausts, error rates cascade across every service that depends on it, but they don't all spike at the same time or at the same rate. The Error Rate Comparison view shows all services side-by-side on the same timeline, making cascading failures visually obvious within seconds of them starting.

  • Side-by-side error charts: all 96+ services plotted on a shared timeline; outlier spikes immediately visible
  • Cascade detection: services with correlated error rate spikes are automatically grouped and flagged as a potential cascade
  • Baseline comparison: error rates compared against a rolling 7-day baseline; anomalies highlighted above normal variance
app.novaaiops.com · Error Rate Comparison
Error rate comparison across services
SLO Compliance Table

Sort by burn rate. Find which services are eating your error budget fastest.

The SLO Compliance Table is sortable, searchable, and expandable. Sort by current burn rate to find the services most at risk of SLO violation this week. Expand any row to see dependency chains, historical compliance trends, and exactly how much error budget remains. Everything your team needs to prioritize reliability work, in one table.

  • Sortable burn rate column: instantly identify which services are consuming error budget fastest and need immediate attention
  • Expandable dependency rows: each service row expands to show upstream and downstream dependencies with their own SLO status
  • Remaining budget visualization: green/amber/red budget bars show at a glance how close each service is to SLO violation
app.novaaiops.com · SLO Compliance
SLO compliance table with burn rate

See every service's SLO compliance in real-time

Stop discovering SLO violations from customers. Get the full Service Health Matrix across all 96+ services in minutes.

Start Free Trial Request a Demo