The On-Call Tool Belt
What the on-call needs in 30 seconds. The tool belt and the keyboard shortcuts.
The tools
The tool belt is the small set of surfaces the on-call needs in the first 30 seconds of a page. Bookmarked, one or two clicks deep, predictable. Four surfaces carry most of the work: dashboard, incident channel, runbook search, logs and trace UI.
- Dashboard. At-a-glance service health: latency, error rate, saturation panels. First glance after the page lands.
- Incident channel. Standing Slack channel for declaring and coordinating. First place to drop "I'm IC, investigating X."
- Runbook search. Searchable runbook index per team. "What do I do for X" in seconds, not minutes.
- Logs and trace UI. Bookmarked observability views per service. Reachable in 1-2 clicks from the alert.
Shortcuts
Keyboard shortcuts compound across hundreds of pages a year. The investment per shortcut is small; the daily payback compounds. Documented and shared per team so new on-calls do not have to derive the set; version-controlled per engineer so the muscle memory survives machine moves.
- Shortcuts for daily tools. Keybinding set for Slack, dashboards, terminal. Pays back across hundreds of pages.
- Documented and shared. Published shortcut reference per team. New on-calls inherit the patterns.
- Version-controlled dotfiles. Keybindings tracked in git per engineer. Survives machine moves.
- Recommended starter pack. Curated team-level shortcut set. Onboarding consistency without re-explaining each engineer's setup.
Integrations
Integrations remove the navigation overhead between page and investigation. Page lands; the dashboard is already loaded with the right time window; the runbook URL is in the alert text; the recent deploy context is one click away. Critical when the page wakes the on-call at 3 AM and cognitive load is the bottleneck.
- Context-loaded dashboard from page. Linked dashboard URL with the right time window per alert. Saves 30+ seconds per page.
- "Where do I look" overhead removed. Cognitive cost reduction per page. Critical at 3 AM when context-switching is expensive.
- Runbook link embedded in alert. Matching runbook URL per alert. Catches "I know there is a runbook somewhere" delays.
- Recent-deploy context per alert. Affected-service deploy summary in the alert. Catches deploy-correlated failures fast.