Agentic SRE Advanced By Samson Tanimawo, PhD Published May 14, 2026 5 min read

SSL Certificate Expiry: Detection, Renewal, Rollout

Three problems, three sub-agents, one orchestrator. The split, the integration with cert-manager, and the dry-run output that an SRE can sanity-check.

Three problems, three sub-agents

Detection sub-agent: scan certs daily, flag those expiring in <30 days.

Renewal sub-agent: trigger cert-manager (or your renewal flow) for flagged certs.

Rollout sub-agent: deploy the renewed cert and verify the endpoint is serving the new chain.

The orchestrator

Code, not LLM. The orchestrator is a state machine: detect → renew → rollout. Each transition is deterministic.

LLM is invoked only for non-routine cases: cert types the renewal sub-agent does not know how to renew, hosts that did not pick up the new cert.

The split keeps the routine cases cheap and fast.

Dry-run output

Each step emits a dry-run summary: what cert, what host, what the agent would do, expected duration.

The summary is reviewable by SREs before any action. Most reviews approve in seconds; the dry-run is the safety check.

Dry-run output also serves as documentation: the cert renewal flow is described, by the agent, in plain language.

Verification after rollout

Probe the endpoint with openssl s_client; confirm the served cert matches the expected one.

Probe from multiple network paths if relevant. Internal vs external; the cert may differ.

Verify within 5 minutes of rollout. If verification fails, alert immediately; cert mis-rollout breaks customer connections.

Escalation cases

Renewal failed: the CA rejected, rate-limited, or returned an error. Surface the error; human investigates.

Rollout failed: the new cert was issued but not picked up by the server. Configuration issue; human investigates.

Verification failed: the rolled-out cert is not what is being served. Possible deployment issue; the agent does not auto-revert this; human investigates.