DNS Caching Layers

OS, app, resolver.

Overview

DNS responses are cached at multiple layers. Each layer respects its own TTL semantics; understanding which layer caches where is the difference between a clean failover and a 30-minute outage.

Application caching. JVM DNS cache, Go’s net.Resolver cache. Some runtimes cache for the lifetime of the process by default.
OS resolver cache. systemd-resolved, nscd. Caches at the host level across processes.
Recursive resolver cache. ISP or cloud resolver. The DNS layer most operators forget about.
TTL drives behaviour. Short TTL for failover-critical records; long TTL for steady-state performance.

The approach

Three habits make DNS caching predictable: short TTL for failover, application-level cache tuning, and game-day failover testing.

Short TTL for failover-critical. 60 seconds for records that must move during incidents. The TTL is the upper bound on recovery time.
Application cache tuning. JVM networkaddress.cache.ttl, Go DNS cache TTL. Override the default-forever behaviour.
Monitor resolution time. Per-resolver latency dashboard. Investigations have one place to start.
Documented topology plus failover tests. Per-tier cache layer documented; game-day exercises validate the failover actually works.

Why this compounds

The first DNS-failover test surfaces every cache layer the team did not know about. Subsequent tests reuse the patterns; new services ship with TTL choices that match the recovery expectations.

Better failover behaviour. Right TTL produces predictable recovery. Customers see the DNS change within the TTL window.
Better performance. Long TTL where appropriate produces fast resolution. The cache pays back in latency.
Better incident response. Cache-layer awareness shortens DNS-flavoured incident investigations measurably.
Year-one investment, year-two habit. The first tuning is heavy lift. By year two, every new record ships with a deliberate TTL choice.