The Shutdown Procedure Discipline for Decommissioning Services
Most services that should be retired are not. The 6-step procedure that actually retires services, with the bugs that hide in step 4.
The six steps
Service decommissioning has six discrete steps. Skipping any one of them produces orphan dependencies that surface as incidents months later.
- 1. Communicate. 90 days notice for internal consumers; 6 months for external; written, not Slack-DM'd.
- 2. Deprecation warnings. Logs and response headers carry deprecation notices; visibility before removal.
- 3. Migrate consumers. Each consumer team has a named owner; migration tracked to completion in a shared doc.
- 4. Stop accepting new traffic. The most error-prone step; some consumer always missed migration.
- 5. Wait. 30-day quiet period; the service runs but is unused; surfaces forgotten callers.
- 6. Tear down. Code, infrastructure, runbooks, dashboards; leave nothing behind.
What hides in step 4
Step 4 (cutting traffic) is where shutdowns go wrong. Three classes of forgotten consumer surface here, never earlier.
- Forgotten consumers. Scheduled jobs in other teams' repos that nobody updated; the team never read the deprecation email.
- External callers. Customers integrated against an internal endpoint that became external; you did not know they existed.
- Caching layers. Stale references in caches outlive the service; clients still resolve old endpoints.
- Mitigation. Soft-block first (return 410 Gone with a message); hard-block only after the soft-block traffic drops to zero.
Why bother
Decommissioning is unsexy work. The payoff is real and compounds; ignoring it accumulates technical and operational debt.
- Operational cost. Services that exist consume on-call attention even when unused; alerts fire, runbooks rot.
- Cognitive cost. Engineers cannot tell what is real and what is dead; investigation time grows.
- Security cost. Unmaintained services accumulate vulnerabilities; the next CVE has nowhere to land but here.
- Cloud cost. Every dead service consumes compute, storage, log volume; the bill quietly grows.