Database Monitoring: The Five Numbers That Matter
Monitor too few and you miss issues; too many and they drown in noise. Five is the right number.
Why five
One metric misses cases. Twenty metrics drown signal.
Five-metric monitoring catches 90% of operational problems.
The five metrics
- 1. Connection count vs max.
- 2. Slow query rate.
- 3. Replication lag.
- 4. Lock wait time.
- 5. Disk usage growth.
Dashboard pattern
One panel per metric; trend over 24h; threshold lines visible.
Single dashboard URL; bookmarked by on-call; the database health view.
Alert thresholds
Connection count: alert at 80% of max.
Slow query rate: alert if doubled vs baseline.
Replication lag: alert at 30s.
Lock wait: alert at 5s p99.
Disk growth: alert at 7-day projection > 80%.
Antipatterns
- Monitoring without baseline. Cannot detect anomaly.
- Twenty metrics. Drown signal.
- No alerts on five. Discovery via incident.
What to do this week
Three moves. (1) Apply this pattern to your most-loaded table. (2) Measure query latency / write throughput before/after. (3) Document the win and the constraint so the next refactor inherits the knowledge.