Pre-Paging Context Loading

Context loaded before the on-call sees the page.

The idea

By the time a human is paged, the system already has 90 seconds of context: what changed recently, which alerts also fired, which dashboards are relevant. Pre-paging context loading attaches that data to the alert payload before the page goes out, saving 2-5 minutes of triage per incident; compounded over a year, that’s days of on-call time recovered.

What to pre-load

Three categories of context cover most incidents. Recent deploys for the affected service (Argo CD events, GitHub Actions runs; 80% of incidents follow a deploy); related alerts within the last 15 minutes (cluster the firing signals so the on-call sees the full picture); top affected endpoints, top affected customers, current load (all derivable from APM data).

How to load

The mechanics are simple. Webhook from PagerDuty into a Lambda or Cloud Run job; the job queries Datadog, Argo CD, and the service catalog and posts back to the alert payload; latency target 30 seconds because slower means the human reaches the alert before the context arrives; cache aggressively because most incidents share context within a 5-minute window.

When it fails

Three failure modes deserve mitigation. Stale data (deploy lookup 30 minutes behind is worse than no data because on-call trusts wrong information); too much data (40 lines of context is unreadable on a phone, cap at 5 facts); vendor outages (if Datadog is down, pre-loading fails, fall back to a basic page rather than blocking).

Get started

The starter ramp is concrete. Pick your top 3 services and build a simple webhook that adds “recent deploys” to the alert payload; measure MTTA and MTTR before and after with a target of a 30-second drop in median triage time; iterate per service because adding context to all services at once is over-investment.