Best Practices Intermediate By Samson Tanimawo, PhD Published Dec 2, 2025 5 min read

Game Days vs Fire Drills: What Each Practice Really Trains

A game day rehearses a known incident; a fire drill tests recovery from a surprise. They train different muscles, and most teams skip one.

Two practices, two muscles

A game day rehearses a known scenario the team has anticipated. A fire drill is a surprise: someone (the GM) injects a failure without warning and the team responds. Different muscles. A team good at game days but not at fire drills will execute the playbook beautifully in a known scenario and freeze the first time the failure is one they had not pre-loaded.

The cognitive distinction. Game days train preparation; fire drills train improvisation. Both are necessary. A team that only does game days handles known incidents flawlessly and panics on novel ones. A team that only does fire drills handles surprise but never invests in playbooks that would prevent the same fire repeatedly.

The investment trade-off. Game days are cheap (planned, documented, predictable). Fire drills are expensive (require a GM who knows the system, real disruption, possible blast radius). Most teams should do game days quarterly and fire drills annually until both muscles are strong.

Game days

Pick a scenario. Plan it. Run it on a Tuesday afternoon with the on-call team. The goal is to validate the runbook, the access paths, the decision tree. Game days catch process gaps and stale credentials, and they let new on-callers practice without the cost of a real incident.

The scenario selection. Start with the most-likely incidents: a database failover, a deploy regression, a dependency outage. Each scenario the team rehearses is one that, when it happens for real, becomes a 20-minute incident instead of a 60-minute one. The leverage is large.

What game days catch. Stale credentials nobody noticed expired. Runbooks that reference old service names. Access paths that broke during a permissions migration. The on-call who hasn't done this scenario before. Each is a ticking bomb that the game day disarms.

Fire drills

The GM (a senior who is not on-call) introduces a real-but-non-customer-impacting failure and watches what happens. The team does not know it was a drill until the postmortem. Fire drills catch what game days miss: surprise tolerance, how the team behaves when the runbook does not apply, and which engineers freeze under pressure (so they can be coached).

The GM's role. Senior engineer who knows the system intimately. They inject the failure on a quiet afternoon, observe the response in real-time, and reveal the drill in the postmortem. The GM doesn't help during the drill; the team must respond as they would to a real incident.

The blast-radius care. Fire drills must be safe. Inject failures in non-customer-facing systems first; only escalate to customer-facing once the team has demonstrated competence. The GM has a kill switch: if the drill goes worse than expected, they reveal it immediately and stop.

Cadence

Game days quarterly. Fire drills semi-annually, and only when the team is mature enough that a surprise will not damage trust. Less often and the muscle atrophies; more often and the team treats them as theatre.

The atrophy rate. Game-day muscles atrophy in 4-6 months without practice. The runbook the team rehearsed in Q1 will have small drift by Q3; the team that hasn't rehearsed since Q1 will hit the drift during a real incident. Quarterly maintains freshness.

The trust constraint on fire drills. New teams or teams in a stressful period (recent layoffs, major reorg) shouldn't fire-drill. The exercise requires baseline trust between the team and the GM; without trust, the drill becomes adversarial. Wait for cultural stability.

When the practice is sham

Three signs. The same engineer is the IC every time. The runbook in the practice is different from the runbook in production. The postmortem is short and complimentary. Each one means the team has stopped practising and is performing.

The same-IC sign. If the senior engineer is always IC, junior engineers aren't getting practice. The exercise's training value is concentrated in one person; everyone else is observing. Rotate ICs; the awkward first time for each junior is the point.

The runbook divergence. Teams sometimes use a "clean" runbook for the game day that's better than the actual one. The drill succeeds; the real incident fails because the actual runbook is rotted. Use production runbooks; if they're inadequate, that's the lesson.

Where to start

Game days first. Master the format. Add fire drills only after the team can run a game day without leadership oversight. Fire-drilling a team that has not run a game day teaches the wrong lesson.

The progression. Quarter 1: first game day, leadership-supervised. Quarter 2: second game day, team-run. Quarter 3: third game day, junior IC. Quarter 4: first fire drill (planned by senior, surprises the on-call). The progression takes a year; rushing it produces poorly-prepared teams.

The "just do it" temptation. Senior engineers want to skip to fire drills because they're more interesting. Resist. The team that hasn't built the game-day muscle will fail the fire drill in ways that erode trust rather than build skill.

Common antipatterns

The game day that's actually a tabletop. Engineers gather in a meeting room and discuss "what would we do if X happened." Useful but not a game day. Real game days execute against real systems; the action of running commands surfaces issues the discussion misses.

Fire drill announcements. Email goes out: "Fire drill happening sometime this quarter, watch for a real-looking page." The drill loses surprise; the team treats every page that quarter as suspect. Surprise is the value.

Game days without postmortems. Team runs the exercise; nobody writes down what they learned. By next quarter, the same gaps are still there. Treat game-day postmortems like real-incident postmortems.

Game days that always pass. The scenarios chosen are easy; the team handles them flawlessly; everyone feels good. The team isn't growing. Pick scenarios that stretch the team; failures during practice are the point.

What to do this week

Three moves. (1) Schedule the next game day on a Tuesday afternoon two weeks out. Pick a likely scenario; assign a junior IC. (2) Document the postmortem template specifically for game days — focus on "what didn't work" rather than the team's incident-response performance. (3) Identify your team's GM bench: who can run a fire drill in 6 months? Start training them now by having them shadow the next game day.