Building a Status Page People Actually Trust (And What to Never Do)
Status pages either build trust or shred it. Here is what separates the two, based on the patterns customers actually read.
What “credible” means
A credible status page does three things: it confirms what users already suspect (within minutes, not hours), it narrows down the impact precisely (which component, which region, which percent of users), and it commits to a next-update time that it keeps.
Get any of these wrong and customers will assume the status page lies.
The seven rules
- Post within 10 minutes of detection. Even “we are investigating reports of X” counts.
- Name the component. “API” is too coarse. “Checkout API, eu-west-1” is the right grain.
- State impact in user terms. Not “elevated error rate on backend service Foo,” but “logins failing for approximately 8% of users in Europe.”
- Commit to next-update time. “We'll post an update in 30 minutes” is a promise. Keep it, even if the update is “no change.”
- Link to the postmortem when it ships. Status entry for an outage should include a link to the postmortem within 14 days.
- Show historical uptime honestly. A 90-day graph with every incident marked. No quiet filtering.
- Auto-subscribe customers on incident impact. If they were affected, email them. Don't require opt-in to learn about their own downtime.
The three mistakes
- “Some users may experience intermittent issues.” Weasel phrasing. Say who, where, what percent.
- Marking resolved before users confirm. Wait for error rates to return to baseline for 15 minutes post-fix. “Resolved” that reverts within an hour destroys the page's credibility for quarters.
- Skipping small incidents. Customers see them. When the status page shows green and their dashboards show red, they trust their dashboards.
Tooling suggestions
Statuspage (Atlassian), Instatus, and self-hosted options like Upptime all work. The tool matters less than the operational commitment.
Automate the “detect + post within 10 minutes” loop by wiring your incident tool (PagerDuty, Incident.io, FireHydrant) to auto-create a status page entry. Humans still edit the text, but the page is live within 60 seconds of the page being opened.
The meta-pattern
The status page is a trust artefact. It's not marketing, it's not legal coverage, it's a public commitment to transparency. The best ones read like a postmortem in real time; the worst read like a press release after the fact.
The status page is a trust artefact. It is not marketing, it is not legal coverage, it is a public commitment to transparency.
The handoff between incident and page
As soon as the incident channel opens, the scribe writes a draft status update. It is usually two sentences: what we think is happening, who is affected. It goes to the page within 10 minutes.
Every subsequent update on the page includes a commitment to the next update time. 'Next update in 30 minutes' beats 'we will update when we have more' by an order of magnitude in customer trust.
When the incident resolves, the page entry is not closed until error rates have been at baseline for a full 15 minutes. Resolved-then-unresolved is the single fastest way to lose trust in the page.