The DNS Resolution Agent: Why It's a Good First Project

Bounded scope. Read-only signals. Clear success criteria. Why the DNS investigation agent is the project to ship before harder ones, plus the skeleton.

Why this scope is good

The scope is what makes DNS the right project zero. Bounded inputs: a hostname, a query type, an environment, three inputs and nothing else. Read-only signals: dig, nslookup, DNS cache contents, nothing the agent does changes the world. Clear success criterion: did the resolution succeed, what was the answer, if it failed why.

Bounded inputs. Hostname, query type, environment; three inputs, nothing else.
Read-only signals. dig, nslookup, DNS cache contents; the agent doesn’t change the world.
Clear success criterion. Did resolution succeed; what was the answer; if failed, why.
Per-axis simplicity. Each axis is constrained; the project is tractable from day one.

Tools the agent needs

The agent needs three tools. dig wrapper queries authoritative servers and returns structured output; cache lookup checks local resolver caches and catches stale-cache issues; DNS-DB query pulls records from the authoritative source for the zone and compares with what the resolver returned.

dig wrapper. Queries authoritative servers; returns structured output.
Cache lookup. Checks local resolver caches; catches stale-cache issues.
DNS-DB query. Pulls records from authoritative source; compares with resolver answer.
Per-tool structured output. Each tool returns parseable structure; supports the classification step.

Output classes

The agent classifies into four buckets. Resolved correctly (most cases, agent confirms and exits); NXDOMAIN (hostname does not exist, sometimes a typo, sometimes intentional); stale cache (resolver returned an old answer, cache flush usually fixes); authoritative misconfiguration (zone has a problem, escalate to the team that owns the zone).

Resolved correctly. Most cases; agent confirms and exits.
NXDOMAIN. Hostname does not exist; sometimes typo, sometimes intentional.
Stale cache. Resolver returned old answer; cache flush usually fixes.
Authoritative misconfiguration. Zone has a problem; escalate to zone owner.

Why this is project zero

DNS is project zero because the failure modes are bounded and the runbooks already exist. DNS issues are common and well-understood and the team has prior runbooks (the agent is a translation, not a from-scratch design); failure modes are bounded (the worst the agent can do is return a wrong classification, humans verify before acting); success is satisfying because correct classification in 3 seconds produces team buy-in.

Common and well-understood. Prior runbooks exist; the agent is a translation, not from-scratch.
Bounded failure modes. Worst case is wrong classification; humans verify before acting.
Satisfying success. Correct classification in 3 seconds produces team buy-in.
Per-team confidence build. Project zero produces the muscle for harder agents.

What to graduate to next

The graduation path is concrete. After DNS, try certificate-expiry investigation (similar bounded scope, similar read-only signals); after that, try external-service-status checks (the probe-classify-report pattern is common across many SRE problems); by the third agent, the team has internalised the agent-building pattern and subsequent agents come faster.

Cert-expiry investigation next. Similar bounded scope, similar read-only signals; natural progression.
External-service-status checks. Probe, classify, report; common pattern across SRE problems.
Third agent internalises pattern. Subsequent agents come faster.
Per-team capability ramp. Each agent grows the team’s agent-building muscle.