How to Write a Customer Status Update During an Incident
The status update is the part of incident response that engineers undervalue. A good one buys you 90 minutes of customer patience. A bad one burns it in 15.
The asymmetric value of a good update
Engineers think of status updates as a chore that takes 10 minutes from the fix. Customers experience the update as nearly the entire signal of whether you have the situation in hand. Five minutes of bad communication erases an hour of good engineering. The leverage is huge.
The reason for the asymmetry: customers can't see the engineering work. They can only see what you tell them. The team that resolves an incident in 30 minutes but communicates badly is experienced as worse than the team that resolves in 60 minutes but communicates well. This is unfair to engineering and yet structurally true; the public-facing artifact dominates the perceived experience.
The corollary: investing in comms quality has higher ROI per hour than investing in slightly faster mitigation. Most engineering teams have it backwards, they spend 95% on the technical work and 5% on the comms. The teams that get reputations for "calm during outages" are the ones that flip the ratio for the ten minutes it takes to write a great update.
The four-sentence structure
- What: a one-sentence description of the symptom in customer terms.
- Who: who is affected and roughly how many.
- Now: what the team is currently doing.
- When: when the next update is coming.
Four sentences, four pieces of information. The reader knows what to do with their next 30 minutes: do they need to evacuate to a workaround, do they need to alert their team, do they wait for the all-clear?
Why exactly four. Three is not enough, usually misses either the impact scope or the next-update commitment. Five or more is too much, the reader skims and misses the action they need to take. Four is the sweet spot tested across thousands of status pages: enough information to decide, little enough to absorb in 10 seconds.
The discipline of the four-sentence structure is what makes it useful. Without it, status updates drift toward marketing prose ("we are committed to providing the highest level of service...") or technical ramble ("the load balancer's connection pool was saturated due to..."). Four sentences, in this order, produces calm, customer-facing communication.
Words that signal competence
"We have identified..." (you know the cause). "We are deploying..." (you have action). "Currently affecting roughly..." (you have measurement). "Next update at..." (you have rhythm).
Each of these phrases does double work. They convey information AND signal that you have your act together. The competence signal is what builds trust over a long incident. Customers reading a series of updates over 90 minutes are reading the SHAPE of the updates as much as the content; if the updates have these phrases consistently, the customer infers a team that knows what it's doing, even if the resolution is taking longer than hoped.
Other phrases that signal competence: "the cause was a [specific component]" (you understand it), "we have implemented a workaround" (you've taken action even if not fully resolved), "we will publish a postmortem within 5 business days" (you take responsibility). Each is concrete; each is a commitment.
Words that sound like cover-up
"Some users may experience..." (vagueness). "Brief disruption" (you don't know how long). "Working as expected" (you don't know what's wrong). "Apologies for any inconvenience" (the apology of someone who doesn't really get it).
The pattern these phrases share: they all distance you from the impact. "Some users" makes the affected population fuzzy; "brief" makes the duration negotiable; "as expected" implies the situation is normal; "any inconvenience" treats the customer's pain as hypothetical. Each is technically accurate and yet feels evasive. Customers notice.
The replacements. "Some users" → "approximately 12% of users in EU." "Brief" → "approximately 35 minutes so far." "Working as expected" → "the system is operating but with [specific degraded behaviour]." "Apologies for any inconvenience" → "we know this disrupted [specific user activity]." Each replacement is concrete; each treats the customer's experience as real.
Cadence through a long incident
First update within 15 minutes. Then every 30 minutes during active investigation. Every hour during stable monitoring. Final resolution update within 30 minutes of all-clear. Skipping a cadence is the moment trust drops.
What to do when the cadence outruns the work. The right move is to publish the cadence update with no new information: "We are still investigating. The current theory remains [X]. We expect to know more within 30 minutes." Customers prefer "no news" to silence; team members get a rhythm marker.
The cadence is asymmetric: you can shorten it (extra updates during a fast-moving incident) but never lengthen it without explicit signal. If you've been posting every 30 minutes and now want to post every hour, say so: "Things are stable; we'll move to hourly updates from here." Without the explicit signal, customers experience the gap as silence and assume the situation has worsened.
Two examples, side by side
Bad: "We are aware that some users may be experiencing slowness on the platform. Our team is investigating. Sorry for any inconvenience. We will update as soon as we have more information."
Good: "We are investigating reports of slow page loads affecting checkout for European customers. Approximately 8% of paying users are seeing 5+ second response times. We have identified the affected service and are deploying a fix. Next update at 14:30 UTC."
The bad example has the right intent and zero specificity. It's a placeholder dressed up as a status update. The good example is two more sentences and 4× the information. It tells the reader: who's affected (Europeans, 8%), what they see (5+ second loads, checkout specifically), what we're doing (deploying fix), and when to check back (14:30). A reader has everything they need to decide whether to wait, alert their team, or escalate.
After resolution
The all-clear comm is the resolution-template post: timestamp + brief cause + what we did + when the postmortem will be published. Send it within 30 minutes of resolution; later, customers have moved on to writing their own internal incident reports, and your update arrives too late to inform them.
The postmortem itself is a separate artifact, published 5-10 business days later. The resolution comm should explicitly commit to the postmortem date. Missing the postmortem date is a small but real trust hit; enterprise customers track this and bring it up at renewal time.
One mistake to avoid: omitting the cause from the resolution comm. "This is now resolved" without "the cause was X" leaves customers wondering whether the same thing will happen tomorrow. Even a one-sentence cause ("an expired certificate caused authentication failures; the certificate has been renewed and rotation has been automated") closes the loop in a way that "resolved" alone doesn't.
What to do this week
Three moves. (1) Audit your last 5 status-page updates against the four-sentence structure. Most teams find they're missing the "When" sentence; that's the easiest fix. (2) Build a checklist of banned phrases (the four words from earlier, "sorry for the inconvenience", "some users", "brief", "should") and pin it next to your status-page editor. (3) Write a "post-resolution template" that includes the postmortem-publication date as a required field. The required field forces the team to commit; the commitment forces the postmortem to ship on time.