Network Monitoring: The Five Numbers
Network monitoring is often network-team-only. SREs benefit from these five numbers being visible.
Why five
Network issues hide in app metrics; surface only when the wire is observable.
Five metrics catch most.
The five metrics
- 1. Packet loss rate.
- 2. Latency p99 to dependencies.
- 3. Connection-error rate.
- 4. DNS resolution time p99.
- 5. TLS handshake time p99.
Dashboard pattern
One panel per metric; trend over 24h.
Per-dependency drill-down for 2-5.
Alert thresholds
Packet loss: alert above 0.1% sustained.
Latency: alert on 50% increase from baseline.
Errors: alert on 2x baseline.
DNS: alert above 100ms p99.
TLS: alert above 200ms p99.
Antipatterns
- App-only monitoring. Misses network root causes.
- One global metric. Hides per-dependency issue.
- Threshold without baseline. Wrong alarm rate.
What to do this week
Three moves. (1) Apply this pattern to your highest-risk network path. (2) Measure the failure mode rate before/after. (3) Document the change so the next incident-responder inherits the knowledge.