tmux for On-Call Engineers
tmux split-pane for parallel investigation.
Splits
tmux for on-call is the discipline of using pane splits to organise incident response. Four panes give the engineer parallel views without context-switching, and the layout becomes muscle memory after the second incident.
- Pane 1: kubectl. Cluster commands. The on-call interacts with pods, runs targeted queries, and inspects state without leaving the layout.
- Pane 2: logs. Log tailing in a dedicated pane. Real-time logs flow alongside the kubectl pane so the engineer sees the effect of every command.
- Pane 3: dashboard. Text-based metrics or a
top-equivalent. Continuous awareness of system load while the rest of the panes are doing targeted work. - Pane 4: notes plus single-screen awareness. Scratchpad for incident notes (timestamps, observations, decisions) plus the four-panes-at-once view that keeps situational awareness high and context-switching low.
Save sessions
tmux sessions persist across SSH disconnects. The discipline is preserving incident state so reconnection resumes the work rather than restarting it.
- Notes survive disconnect. The notes pane's content is preserved across detach. Even after disconnect, the incident record is recoverable rather than lost in a closed terminal.
- Detach and re-attach. The on-call can drop SSH and reconnect later. The tmux session continues, the panes are intact, the work resumes from exactly where it stopped.
- Survives network interruptions. Flaky network does not lose work. The investigation survives; the discipline is resilient against the worst-case home wifi.
- Multiple sessions plus documented layout. Different incidents map to different tmux sessions; the standard layout is documented so new on-calls inherit the convention rather than rebuilding it during the first page.
Share
tmate (or tmux's built-in pair-programming) supports shared sessions. Multiple engineers collaborate during incidents; the discipline supports cross-team response without screen-sharing latency.
- tmate for cross-team incident response. Shared tmux sessions per incident. Multiple engineers attach; the same panes are visible to all; collaboration is real-time and terminal-fast.
- Pair view. Senior on-call mentors junior, security expert collaborates with SRE, IC sees what the responder sees. The discipline scales beyond solo work.
- Handoff and training. Shift-change handoffs use shared sessions to show the incoming engineer the current state; new on-calls shadow experienced ones and learn the layout in context.
- Documented sharing protocol. Written protocol per team for when to share and how to set up. Discipline is consistent rather than improvised under incident pressure.
tmux for on-call is one of those operational disciplines that pays off in incident response. Nova AI Ops integrates with infrastructure tooling, complementing terminal-based investigation with cluster-wide visibility.