DNS Monitoring
Track resolution.
What to monitor
Resolution time per query. p99 should stay under 50ms typical.
Resolution success rate. Below 99.9% indicates resolver or authoritative issues.
Cache hit rate at the resolver. Below 90% suggests TTLs are too short or load is unusual.
Authoritative DNS monitoring
Per-zone query volume. Spikes indicate either traffic surges or cache invalidation.
Per-record query volume. Most-queried records are candidates for longer TTLs.
Authoritative server health: response time, packet drop, error rate.
Synthetic DNS probes
Probe specific records from multiple geographies. Detect regional resolution failures.
Monitor TTL behaviour. Expected TTL versus observed. Drift catches misconfiguration.
Catches issues before customer reports. Continuous monitoring beats reactive.
Alerting on DNS
Failed resolutions: page immediately. DNS failure cascades quickly to many systems.
TTL anomalies: warning rather than page. Often config drift; investigate during business hours.
Cache miss spikes: warning. May indicate scaling issue at resolver tier.