The Four Golden Signals, Revisited for 2026

Latency, traffic, errors, saturation. The original four still hold; the way you measure them has evolved. The 2026 update with concrete metric definitions.

Latency: percentiles, not averages

Latency is the user-experience signal. Averages lie about tails; percentiles tell the truth.

p50, p95, p99. The working percentile triple per endpoint; mean is not in the picture for SLO purposes.
Averages are useless for SLOs. Mean conflates fast and slow requests; tail regressions vanish in the aggregate.
Per-endpoint, per-method. A service-level latency hides the bad endpoints behind the good ones; split by route.
Per-region cut. Latency by geography catches PoP-specific regressions before customers complain.

Errors: rate, not count

Errors are the correctness signal. Rate makes comparison across services possible; raw counts mislead.

Errors per request. Rate is comparable across services and traffic levels; raw count is not.
4xx vs 5xx. Client errors and server errors warrant different responses; do not collapse the categories.
Error budget. Error rate feeds the SLO budget calculation; the budget drives ship-or-stop decisions.
Named owner per class. Each error class has a responsible team; avoids 'everyone's-and-no-one's' alerts.

Saturation: the leading indicator

Saturation is the leading indicator. It fires before the user-visible failure; most teams under-instrument it.

CPU, memory, connection pool. Utilisation gauges per resource; these move first when the system is stressed.
Preventive action. Acting on saturation before symptoms appear is the difference between an investigation and an incident.
Under-instrumented default. Most teams track latency and errors, miss saturation; the cost is missed early warnings.
Alarm threshold. 70 to 80% utilisation is the standard watch level; tune per resource based on burst behaviour.