Best Practices Beginner By Samson Tanimawo, PhD Published Aug 17, 2026 5 min read

Change Management for Teams That Ship Daily

ITIL-style change management collapses under daily deploys. But "no process" collapses faster. The lightweight change-management protocol that keeps the speed and adds the visibility.

Why heavyweight change management fails

Change-advisory boards meeting weekly do not scale to teams shipping ten times a day. The board becomes a rubber stamp or a bottleneck. Both kill the protocol. The fix is not to abandon governance; it is to compress it.

The pathology. Traditional ITIL-style change management was designed for teams shipping monthly. Adapted to weekly, it slows things down 4x. Adapted to daily, it grinds the team to a halt. Adapted to hourly (which is normal for modern teams), it becomes either theatre (everyone rubber-stamps) or sabotage (engineers route around it).

The correct response isn't to abandon change management, it's to redesign it for the modern cadence. The compressed protocol below covers the actual purposes of change management (visibility, peer review, rollback planning) without inserting humans in serial paths that don't need them.

Three categories

Every change fits into one of three buckets, and each gets a different protocol. The classification is the first move; everything else follows.

The categories balance speed and rigor. Standard changes ship without ceremony because the team has done them many times and the patterns are well understood. Normal changes have a peer review because the change is novel or the blast radius is wider. Emergency changes happen during incidents and use post-hoc review.

What you DON'T want is a four-or-more category system. Each additional category creates ambiguity ("is this a Type 2 or Type 3?") and decision fatigue. Three categories with clear thresholds is the sweet spot.

Standard changes

Pre-approved patterns the team does dozens of times a week: deploys, scaling actions, rotating credentials. No approval needed. The team logs them so they are searchable later. 70-80% of all changes should fall here.

The "pre-approved" framing matters. The team isn't dodging review; the team has invested in a class of changes being safe enough that individual review adds no value. The PR review process, the deploy automation, the canary rollout, these are the review, baked into the workflow rather than added as a meeting.

What qualifies as standard. A change pattern that's: well-documented, has a tested rollback, has been done at least 5-10 times before with no incidents, has automated verification (CI tests + canary metrics). Each criterion should be true; if any is false, the change is normal not standard.

Normal changes

One-time changes with non-trivial blast radius: schema migrations, infrastructure rewrites, vendor swaps. Reviewed by one peer plus a service owner. Logged with the rollback plan attached. 15-20% of changes.

The two-reviewer rule. One peer (someone who can read the code or config and catch obvious mistakes). One service owner (someone who has context on what the service is for and can flag implications). Two different perspectives; two different kinds of catches.

The rollback plan is not optional. Normal changes must include a rollback plan in the change record, even if the rollback is "redeploy the previous version." Documenting the rollback forces the engineer to think about it, which often catches issues with the change itself ("wait, this is hard to roll back").

Emergency changes

The pager is off and you are bleeding. Document after, not before. Two rules: a peer is on the call (not after the fact), and a postmortem follows within 48 hours regardless of outcome.

The peer-on-call rule. Even in emergencies, a second pair of eyes catches obvious mistakes. The peer doesn't need to approve in the formal sense; they need to be on the bridge while the change happens. "Sara, I'm going to restart the database, does that sound right?", 5 seconds of confirmation before destructive action.

The 48-hour postmortem rule. Emergency changes are the highest-risk category; reviewing them after the fact catches the patterns ("we keep needing emergency credential resets, let's fix the rotation"). Without postmortem rigor, emergency changes accumulate as the silent norm.

The 30-second change record

One link to the deployed change, one paragraph on what it does, one paragraph on how to roll back. Stored in the same system the team already uses (chat, ticket, doc, whatever). The barrier to writing one has to be lower than the barrier to skipping it.

The 30-second target is critical. Anything longer becomes a friction; engineers skip it; the record stops being reliable. The record is fine being terse, links to PRs, deployment IDs, runbook URLs, as long as a future incident responder can reconstruct what happened.

Where the record lives. The same system the team is already using. Slack threads in #deploys. GitHub PR descriptions with a deploy template. Linear/Jira tickets. The point is unification: change records should be findable from the same place as everything else, not in a separate change-management portal that nobody opens.

The 3-minute review

For normal changes, a single peer reads the record and either approves or asks one specific question. The peer is not gatekeeping; they are catching the obvious thing the author missed. Three minutes per change; ten per day; less than a senior engineer's lunch hour.

The "one specific question" framing is important. Reviewers who ask vague questions ("is this safe?") get vague answers and don't catch anything. Reviewers who ask specific questions ("what happens if the migration is interrupted halfway?") catch real issues. Train the team to ask specific questions.

The review's leverage. Most "small" mistakes that cause incidents are obvious to a fresh pair of eyes, the original author was deep in the change and missed something a reviewer sees in 30 seconds. The 3-minute review captures most of that value at a fraction of the cost of a formal CAB meeting.

Common antipatterns

The "everyone needs to approve" review. Normal changes require approval from 5 people. Each takes a day. Total turnaround is a week. Effectively kills normal-change velocity. Limit approvers to 1-2 specific roles.

Standard changes that aren't actually standard. "Deploy" is supposedly standard, but the team treats every deploy as different. The classification didn't take. Either the pattern isn't actually standard yet (still novel), or the team needs cultural reinforcement that standard means standard.

Emergency changes that never postmortem. Team uses emergency-change classification to avoid review, even for changes that aren't really emergency. Without the 48-hour postmortem, emergency-change becomes the back door. Audit emergency-change frequency; if more than 10% of changes are emergency, the team is gaming the system.

Change records in a separate system. Slack thread in #deploys, but the change-management portal is also required. Engineers fill out one and skip the other. Pick one; eliminate the other.

What to do this week

Three moves. (1) Audit your last 50 changes. Classify each as standard, normal, or emergency. The split should be roughly 75/15/10. If emergency is over 20%, your team has a different problem (ad-hoc fire-fighting) that's masquerading as change management. (2) Document the standard-change patterns explicitly. The list is short (deploy, scale, restart, rotate); writing it down makes the classification unambiguous. (3) Pick a single tool for change records. Migrate any other system's change records into it within the next sprint. Unification is what makes records reliably findable.