DNS Caching Layers
OS, app, resolver.
Overview
DNS responses are cached at multiple layers. Each layer respects its own TTL semantics; understanding which layer caches where is the difference between a clean failover and a 30-minute outage.
- Application caching. JVM DNS cache, Go’s net.Resolver cache. Some runtimes cache for the lifetime of the process by default.
- OS resolver cache. systemd-resolved, nscd. Caches at the host level across processes.
- Recursive resolver cache. ISP or cloud resolver. The DNS layer most operators forget about.
- TTL drives behaviour. Short TTL for failover-critical records; long TTL for steady-state performance.
The approach
Three habits make DNS caching predictable: short TTL for failover, application-level cache tuning, and game-day failover testing.
- Short TTL for failover-critical. 60 seconds for records that must move during incidents. The TTL is the upper bound on recovery time.
- Application cache tuning. JVM
networkaddress.cache.ttl, Go DNS cache TTL. Override the default-forever behaviour. - Monitor resolution time. Per-resolver latency dashboard. Investigations have one place to start.
- Documented topology plus failover tests. Per-tier cache layer documented; game-day exercises validate the failover actually works.
Why this compounds
The first DNS-failover test surfaces every cache layer the team did not know about. Subsequent tests reuse the patterns; new services ship with TTL choices that match the recovery expectations.
- Better failover behaviour. Right TTL produces predictable recovery. Customers see the DNS change within the TTL window.
- Better performance. Long TTL where appropriate produces fast resolution. The cache pays back in latency.
- Better incident response. Cache-layer awareness shortens DNS-flavoured incident investigations measurably.
- Year-one investment, year-two habit. The first tuning is heavy lift. By year two, every new record ships with a deliberate TTL choice.