Game Day Evolution Over Years
Game days that don't evolve become routine. The yearly evolution that keeps them useful.
Year 1
Year-one game days build muscle for the obvious failure modes. Common scenarios get rehearsed until response is reflexive; the goal is to remove improvisation from the high-frequency cases.
- Common scenarios. Deploy failure, region outage, vendor outage; the frequent failure modes get drilled first.
- Build muscle for the obvious. Repeat the response pattern until each step is reflexive; the operator does not think during the incident.
- Documented injection script. Per-scenario named steps; the game day is reproducible and the lessons compare across runs.
- After-action review. Per-game-day lessons captured; supports incremental improvement and feeds the runbook updates.
Year 2
Year-two stretches into less common scenarios. Data corruption, security, multi-region; the unfamiliar-territory exercises that surface gaps the obvious cases never touch.
- Less common scenarios. Data corruption, security incident, multi-region failure; the cases the team has not yet rehearsed.
- Stretch the team. Unfamiliar territory surfaces gaps in tooling, runbooks, and team coverage; the exercise is the audit.
- Runbook stress-test. Runbook-versus-reality check; catches stale guidance before the real incident relies on it.
- New-skill capture. Documented learnings per game day; supports team-level skill growth across rotations.
Year 3+
Year-three game days go cross-functional. The exercise tests the organisation, not just the engineering team; support, product, communications all participate.
- Cross-team. Multi-team coordination exercise; tests handoffs and joint runbooks across team boundaries.
- Cross-functional. Support, product, communications involvement; tests the full incident chain end to end.
- Test the org. End-to-end organisational response; everyone with an incident-response role participates, not just engineering.
- Named exec sponsor. Leadership engagement per game day; supports sustained investment in reliability programmes.