Incident Response Intermediate By Samson Tanimawo, PhD Published Sep 27, 2026 7 min read

The First 15 Minutes of Any Incident

What happens in the first 15 minutes determines how the next two hours go. Five disciplined moves get you ahead of a fire instead of chasing it.

Why the first 15 minutes

The hour you spend debugging an incident is largely determined by the first 15 minutes. A clean start means a calm bridge, a clear timeline, and a postmortem that writes itself. A messy start means people debugging in parallel, three theories competing, customers learning about the outage from Twitter, and a postmortem nobody wants to write.

The reason is structural. In the first 15 minutes, the team forms its mental model of the incident. Whatever theory the loudest engineer voices in those minutes tends to anchor the bridge for the next hour. If the team starts with "looks like the database," they'll keep checking database things even after the symptoms point elsewhere. The five moves below are a forcing function against premature anchoring, they spend the first 15 minutes establishing facts and structure rather than diving into hypotheses.

Move 1, Ack within 5

Whoever caught the page acknowledges in five minutes. Not "I'm looking", actually clicking the ack button so the alerting system knows a human is on it. Without the ack, the secondary on-call wakes up at the 5-minute mark and now two engineers are debugging the same alert in parallel, which is worse than having one because they each assume the other is on it.

The ack also starts the timeline. PagerDuty / Opsgenie / Incident.io record the ack with a timestamp, and that timestamp becomes the anchor for every subsequent measurement: detection-to-ack, ack-to-mitigation, ack-to-resolution. Skip the ack and your incident metrics get dragged backward. Worse, the postmortem can't reconstruct who was driving when, and the "what could have gone faster?" conversation degenerates into vibes.

Common mistake: posting in the on-call Slack channel "I'm on it" but never clicking ack. Slack and PagerDuty don't talk to each other unless you've integrated them; the ack happens in PagerDuty. Train new on-callers to click both.

Move 2, Set severity

Use the 10-second test. Don't agonise; assume the higher severity if uncertain. Severity drives the rest of the moves: who you page next, how fast comms go out, whether the bridge is voice or async.

The trap here is paralysis. Engineers want to feel "sure" before they classify, and they're never sure 30 seconds in. The right framing is provisional: "calling this SEV2 for now, will revise as we learn more." Provisional classifications work fine; missing classifications stop the response cold because the next move (page who?) depends on the severity.

If the symptom is genuinely ambiguous, classify by current visible impact and let auto-escalation handle the upgrade. SEV2 with 30 minutes of unresolved investigation auto-promotes to SEV1; you don't have to predict the future, you just have to start the clock.

Move 3, Page the right people

SEV1: pull in IC, comms, and operations leads. SEV2: pull in the service owner. SEV3: just the on-call. The wrong move here is pulling in too few people, you can always release them, you can't easily re-summon them after they've gone back to their day.

For SEV1 specifically, the call list is structural, not personal. The IC is whoever's on the IC rotation that week, not "whoever's free." The service owner is the team lead for the affected service, regardless of whether they were the most recent committer. Calling specific people because you trust them more is friendly; calling roles is professional. The role-based approach also survives one engineer being on vacation.

Antipattern: paging the CTO directly to make sure leadership knows. Almost always the wrong move, it bypasses the IC, creates a parallel chain of command, and signals that the team doesn't trust its own escalation policy. The IC pages leadership; the on-call doesn't.

Move 4, First customer comms

SEV1 needs an external "we are aware" within 15 minutes. SEV2 within 30. The bar for the first message is low, three sentences, one timestamp, one promise to update. You are not promising a fix; you are promising you are paying attention.

The reason the bar is low is that customers don't expect you to know the cause yet, they expect you to know that something is wrong. The fastest path to losing trust is silence. The second fastest is over-confident speculation that turns out wrong. "We are investigating reports of slow checkout. Customers may experience timeouts during payment. We will provide an update within 30 minutes" is the right shape.

Pre-write the templates. The IC shouldn't be authoring customer comms in the first 15 minutes; they should be filling in three slots: SYMPTOM, SCOPE, NEXT-UPDATE-TIME. If the comms takes longer than 60 seconds to publish, the template is too complex.

Move 5, Declare ownership

"I'm IC for this." Said out loud, in writing in the channel. Without it, three engineers think they're driving and none are. The most common pattern in postmortems-of-incidents-that-went-poorly is "nobody felt empowered to make a call." That comes from the absence of a declared IC.

The IC role is uncomfortable. It involves making decisions on incomplete information, telling senior engineers to stop debugging because the bridge is going sideways, and writing customer comms when the situation is still ambiguous. Most engineers don't volunteer for it the first time. Train rotating ICs deliberately, every senior engineer should run an IC shift at least quarterly.

The second-best move when nobody declares IC: someone declares "I'll IC unless someone else wants to." That phrasing nominates them while leaving room for a more senior engineer to take it. Two seconds of awkwardness saves twenty minutes of confusion.

When the clock matters most

The first 15 minutes are forgiving on technical decisions and unforgiving on coordination decisions. Take an extra 60 seconds to nail the coordination moves: ack, severity, page, comms, IC. Save the technical depth for minute 16. Almost no incident is made worse by spending an extra minute on coordination; many incidents are made dramatically worse by skipping coordination to dive into the suspected cause.

Common antipatterns to avoid in those 15 minutes. (1) Diving into the metric dashboard alone without telling the team, produces siloed investigation and missed signals. (2) Calling the cause early ("I bet it's the database"), anchors the bridge and the team chases that theory long after symptoms diverge. (3) Skipping comms because "we'll have a fix in 5 minutes", you won't, and now customers are 35 minutes in with no signal from you.

What to do this week

Three concrete moves. (1) Print the five moves on a card pinned to your incident channel; the visible checklist beats any policy doc. (2) In your next on-call training, run a tabletop where each new on-caller declares "I'm IC" out loud, the practice removes the awkwardness. (3) Audit your last 5 SEV2+ incidents: did the team execute the five moves in the first 15 minutes? Track the gap; the gap is your improvement work for the next quarter.