Onboarding On-Call Engineers to Work Alongside Agents
On-call has to trust the agent. The 30-day onboarding curriculum, the shadow-mode period, and the first agent decisions a human should be expected to override.
Week 1: shadow only
The agent runs read-only. The on-call sees the agent's hypothesis alongside their own.
No expectation that the on-call defers to the agent. They form their own opinion; the agent is a peer, not an authority.
Daily debrief: where did the agent agree, where did it disagree, where was each right.
Weeks 2-3: agent-first triage
The agent triages first; the on-call reads the triage; decides whether to follow or override.
Most triage cases follow the agent. Override is logged with a reason; reasons feed back into agent improvement.
On-call still does the action; the agent does not act in this phase.
Week 4: agent action with monitoring
The agent takes specific low-risk actions. The on-call monitors and intervenes if needed.
Action allowlist starts narrow: tag the alert, post a Slack update, create a ticket. Adds-up over weeks.
Confidence in the agent's actions builds gradually. Trust is earned per action class.
First overrides matter
The first time the agent is wrong about something the on-call had to override, the on-call's trust is tested.
Make the override visible. The team sees: "agent said X, on-call did Y, on-call was right because Z." Transparency builds trust.
Track override patterns. Repeated overrides on the same cause means the prompt needs work.
When the on-call has "graduated"
Comfortable defaulting to the agent on standard cases. Confidence to override when needed.
Adds new cases to the eval suite from their experience. The on-call becomes an active participant in agent improvement.
The agent becomes part of the team's tooling, not a curiosity. That is the success state.