Reliability Engineering

Resource saturation, the simple way,
live gauges, threshold lines, no PromQL needed

System Gauge is the resource-utilization page. CPU, memory, disk, network, connection counts, queue depths, per host and per service, with threshold lines from your alert rules drawn directly on the gauges. Use it as the first stop when an SLO is burning hot and you need to know which resource is the bottleneck.

Get Started Talk to Sales

app.novaaiops.com / system-gauge

● LIVE

payments-api · pool-a

cpu

78%

mem

54%

disk

42%

conn

94%

cpu thresholdwarn 70 / page 85

conn thresholdover page line

What's Tracked

Six resource types, every host

CPU, memory, disk, network, connection counts, and queue depth, captured every 10 seconds per host and rolled up per service. The metrics come from the agents already deployed for log/metric collection so there is no new sidecar. Bring-your-own-Prometheus also works: the page reads from your existing Prometheus if you do not want Nova's collectors.

✓
Six resource types: CPU, memory, disk, network, connection count, queue depth, the saturation big-six
✓
Per-host and per-service: drill into one host's saturation or roll up to service-level percentiles
✓
BYO Prometheus: use Nova's collectors or read from your existing Prometheus, same UI, same thresholds

app.novaaiops.com / system-gauge · resources

Resources tracked

cpuuser, system, iowait, steal

memoryused, cached, swap

diskcapacity, iops, throughput

networkbandwidth, packet loss, retransmits

connectionsactive, time-wait, established

queue depthrequest queue, kafka lag, sqs depth

Threshold Lines

Your alerting rules, on the gauge

Every gauge shows the warn and page thresholds from your alert rules as horizontal lines. Crossing a threshold is visible at a glance, no need to know "is 78% bad?" The thresholds come from your existing alert rules so the page agrees with whatever fires your pager.

✓
From alert rules: thresholds derived from existing alert rules; no separate config to maintain
✓
Warn and page lines: two lines per gauge so you see headroom at both severity levels
✓
Visual breach: over-threshold values are red on the gauge; sustained breach gets a small badge

app.novaaiops.com / system-gauge · thresholds

Thresholds · payments cpu

baseline42%

warn line70%

page line85%

current78% (over warn)

last breach14:18 (3 min over warn)

Cross-Resource Correlation

When two gauges move together

Saturation rarely lives alone. The page highlights cross-resource correlations: CPU spike on the API host correlated with connection-count climb on the database host. The correlations are the same engine that drives Cross-Signal Correlation; the gauge view is just the resource-only slice of it.

✓
Live correlation: gauges that move together get a visual link line; clicking opens the cross-signal graph
✓
Service-graph aware: correlations only fire across services connected in the service map; reduces noise
✓
Click to investigate: every correlated pair links to the cross-signal correlation graph for the time window

app.novaaiops.com / system-gauge · correlation

Correlated · payments-api ↔ rds

payments-api · cpu78%

rds · connections94%

correlation0.92 over 30m

likely causeconnection-pool exhaustion

Capacity Forecast

How long until I run out

Each gauge has a small forecast line drawn from the recent slope of the metric. "At current rate, this fills in 14 days." Use it for capacity planning: when does this disk need to be bigger, when does this pool need to be wider. The forecast updates every hour so it tracks reality, not last quarter.

✓
Slope-based forecast: simple linear projection from the recent slope; no ML needed for a usable estimate
✓
14-day default horizon: longer than maintenance windows, shorter than quarterly reviews
✓
Hourly refresh: forecasts update every hour so you do not act on a stale projection

app.novaaiops.com / system-gauge · forecast

Forecast · prod-rds disk

current62%

slope+0.8 pts / day

at threshold (90%)in 35 days

recommendationresize before 2026-06-05

Resource saturation, the simple way,live gauges, threshold lines, no PromQL needed