MTTA Targets and How to Hit Them
Mean time to acknowledge. Targets and the techniques.
Standard MTTA targets
The standard MTTA targets scale with severity. Sev 1: under 5 minutes (customer-impacting incidents need someone responding fast); Sev 2: under 15 minutes (significant but not critical, forgiving but bounded); Sev 3: under 1 hour during business hours (worth knowing about, not worth waking someone for).
- Sev 1: under 5 minutes. Customer-impacting; pager goes off, on-call acknowledges.
- Sev 2: under 15 minutes. Significant but not critical; forgiving but bounded.
- Sev 3: under 1 hour business hours. Worth knowing about, not worth waking someone for.
- Per-tier target documented. The numbers committed to the engineering handbook; supports stakeholder alignment.
Measuring MTTA accurately
Accurate measurement matters. Time from page sent to acknowledged, not from incident detected to acknowledged (that conflates detection time with response time); per-engineer view shows individual responsiveness; per-shift view shows rotation health; per-service view shows alert quality. Outliers matter more than averages because a 3-minute median hides 30-minute outliers.
- Page-sent to ack. Not from incident detected; the detection time is its own metric.
- Per-engineer view. Individual responsiveness; supports fairness.
- Per-shift and per-service views. Rotation health and alert quality respectively.
- Outliers over averages. A 3-minute median hides 30-minute outliers; the 99th percentile is the signal.
How to hit the targets
Three mechanisms drive consistent MTTA. Reliable paging across multiple channels (phone, app, SMS) tested quarterly with synthetic pages; backup on-call with explicit escalation when primary doesn’t acknowledge in 5 minutes; tooling that minimises friction with one-click acknowledge from app, terminal, or chat.
- Multi-channel paging. Phone, app, SMS; tested quarterly with synthetic pages.
- Backup with escalation. If primary doesn’t ack in 5 minutes, secondary gets paged.
- One-click acknowledge. App, terminal, or chat; minimise friction at ack time.
- Per-quarter synthetic test. The chain validated before incidents; supports MTTA target adherence.
Reading the trends
MTTA trends tell different stories. Trending up means responsiveness is degrading (paging tool issues, on-call burnout, rotation understaffing); trending down is good but watch for over-eager acks (acknowledging without action, track time-to-action separately); per-time-of-day shows shift quality with night shifts often higher than day expectedly.
- Trending up. Responsiveness degrading; paging tool issues, on-call burnout, rotation understaffing.
- Trending down with caveat. Watch for over-eager acks; track time-to-action separately.
- Per-time-of-day view. Night shifts often higher than day; pattern-of-life matters.
- Per-week trend dashboard. Trend visible to the team; supports continuous awareness.
When MTTA is consistently bad
Bad MTTA has three causes in order. Tool reliability first because lost pages produce missed acknowledgements (the tool is the floor); rotation health second (burnout, understaffing, on-call fatigue, survey and act on findings); process third where routing fixes are usually high-leverage.
- Tool reliability first. Lost pages produce missed acks; the tool is the floor.
- Rotation health second. Burnout, understaffing, on-call fatigue; survey, act on findings.
- Process third. Pages going to right person; properly classified; routing fixes are high-leverage.
- Per-cause investigation playbook. Each cause has a check; supports fast diagnosis.