Status Pages vs Alerts: Coordination
Internal alerts and external status updates coordinate.
The distinction
Internal alerts page on-call to fix problems; status pages tell customers what’s happening. Different audiences, different language, different cadence. Updating one without the other creates either silent outages (alerts fire, status page green) or fake outages (status page red, no real impact). Wire both together but keep them separately authored.
- Internal alert. Pages on-call to fix the problem; the engineering surface.
- Status page. Tells customers what’s happening; the customer surface.
- One-without-other failure modes. Silent outage (alerts fire, page green) or fake outage (page red, no impact).
- Wired but separately authored. Linked workflow but distinct copy; the audiences need different language.
When to update the status page
The status page lifecycle is well-defined. Customer-facing impact confirmed (more than 5% of users seeing degradation): “investigating”. Cause identified: “identified”. Mitigation deployed: “monitoring”. Confirmed clear: “resolved”. Updates every 30 minutes are fine; faster suggests customer comms should be a person, not a tool.
- Investigating threshold. 5% of users seeing degradation; the entry to public comms.
- Status lifecycle. Investigating → identified → monitoring → resolved.
- 30-minute updates fine. Status pages aren’t real-time; faster updates require a person.
- Per-status template. Each lifecycle state has a template; supports consistent copy.
Automation boundaries
Automation has clear boundaries. Don’t auto-publish from internal alerts to the public status page (false positives become public crises); do auto-create draft incidents on the status page from major alerts so the on-call edits and publishes when ready; Statuspage, Atlassian Statuspage, and instatus all support draft-then-publish workflows.
- No auto-publish. False positives from internal alerts become public crises if auto-published.
- Auto-create draft. Major alerts trigger a draft incident; on-call edits and publishes.
- Vendor support. Statuspage, Atlassian Statuspage, instatus all support draft-then-publish.
- Per-trigger draft policy. Which alerts create drafts is documented; supports consistent triggering.
Language
Status page text is customer-facing. “We’re investigating slow checkout” beats “checkout-api p99 latency above 200ms SLO”; always include impact (who is affected, what they see, what to do about it: refresh, retry, wait); update on a clock because a 10-minute silence reads as panic and even “still investigating” is better than nothing.
- Customer-facing language. “We’re investigating slow checkout” beats internal jargon.
- Always include impact. Who is affected, what they see, what to do about it (refresh, retry, wait).
- Update on a clock. A 10-minute silence reads as panic; even “still investigating” is better than nothing.
- Per-incident voice consistency. Status page voice is documented; supports calm, clear customer comms.
Apply this quarter
The application is concrete. Audit your last 5 customer-facing incidents to see if the status page reflected them in time (within 15 minutes is the bar); wire your top 3 alert categories into draft-creation on the status page (resist auto-publish); train on-call on status-page voice because the first incident is too late to learn.
- Audit last 5 incidents. Did the status page reflect them in time; 15 minutes is the bar.
- Wire top 3 alert categories. Draft-creation on the status page; resist auto-publish.
- Train on-call on voice. The first incident is too late to learn; rehearse beforehand.
- Per-quarter status-page review. Continued improvement of the status page workflow; supports the discipline.