Agentic SRE Advanced By Samson Tanimawo, PhD Published Apr 2, 2026 5 min read

Wiring an SRE Agent into PagerDuty

Webhooks in. Acknowledgements out. The integration code, the auth pattern, the retry policy, and the bug that took us six weeks to find.

Webhooks in

PagerDuty fires a webhook on incident creation. The agent service receives it; verifies the signature; extracts the incident details.

Signature verification is non-negotiable. Without it, the endpoint is a denial-of-service surface.

Extract: incident id, urgency, description, service id, escalation policy. These five fields cover most agent use cases.

Acknowledgements out

When the agent starts triaging, it acknowledges the incident in PagerDuty. The acknowledgement is silent to the team; the agent's name is the acknowledger.

When the agent finishes, it adds a note with the hypothesis. The note appears in the incident timeline.

If the agent escalates, it does not auto-resolve; the human takes over and resolves.

Auth pattern

API key per environment. Rotate quarterly; do not share across environments.

Scopes: minimum required. "Acknowledge incidents and add notes" is enough for most agents.

Audit log enabled. PagerDuty's own audit log shows what the agent did; useful for post-incident review.

Retry policy

Idempotent operations: retry up to 3 times with exponential backoff.

Non-idempotent operations (creating notes): retry once with a deduplication token. The dedup prevents double-notes on transient failures.

Hard cap: 5 retries total per webhook. Beyond that, log the failure and stop. The agent's job is best-effort; PagerDuty has its own state of truth.

The bug we hunted for six weeks

Webhook signatures occasionally appeared invalid. The error rate was 0.3%; not enough to alert, enough to lose runs.

Root cause: PagerDuty's webhook payload contains a timestamp; we were stripping trailing whitespace from the body before verifying. The signature was computed on the original body.

Lesson: when verifying webhooks, do not normalise the body. Verify the bytes you received, exactly. The fix was three lines; the diagnosis was three weeks.