Blue-Green Database Migrations Without Downtime
Database changes are scary. The blue-green pattern adapted for databases lets you migrate without user impact.
Setup
Blue/green database pattern is the discipline of running two databases side by side and switching application traffic between them. The pattern enables low-risk database migrations, version upgrades, and instance-class changes by removing the in-place modification from the critical path. It is more complex than in-place changes; the safety and rollback benefits often justify the complexity.
What the setup looks like:
- Spin up new database (green) alongside old (blue).: The green database is provisioned with the target configuration: new version, new instance class, new schema. The provisioning happens without affecting the blue database; the application continues operating on blue.
- Replicate from blue to green continuously.: A replication stream copies all changes from blue to green. The replication uses native database replication (logical replication for Postgres, binlog for MySQL) or managed services (AWS DMS). The replication runs continuously until the cutover.
- Validate green during the steady state.: Before cutover, the team validates green: schema correctness, performance with production-like queries, replication lag stability. Issues caught here are fixed without affecting blue.
- Test cutover in staging first.: The cutover procedure is exercised in staging or a non-production environment first. The test catches procedural issues; the production cutover follows a known path.
- Plan for capacity.: Running blue and green simultaneously costs more during the migration period. Storage, compute, network for both. The cost is bounded by the migration duration; the benefit is reduced migration risk.
The setup phase is the longest phase. It happens without urgency; the cutover is the brief, high-attention phase that follows.
The flip
The cutover is the moment when application traffic moves from blue to green. The procedure must be tight: the window during which writes are blocked must be brief enough to be acceptable.
- Drain writes to blue.: The application briefly pauses writes (or routes them to a queue). The pause is the only customer-visible part of the migration; it must be brief.
- Verify replication caught up.: With writes paused, replication lag drops to zero. The team verifies green has every change blue has. Without this verification, the cutover loses data.
- Flip application config to green.: The application configuration (connection string, secret store entry, environment variable) is updated to point at green. The change applies on next reload; orchestration tooling reloads the application.
- Resume writes.: Application writes flow to green. The migration is operationally complete; blue is now stale and can be retained for rollback or eventually decommissioned.
- Total flip duration: minutes.: The whole cutover (drain, verify, flip, resume) is minutes, not hours. Longer windows are signs of an undisciplined cutover; shorter windows are signs of preparation.
The flip is the high-risk moment. Preparation in the setup phase is what makes the flip routine.
Rollback
The blue/green pattern's main value is rollback safety. If green has unexpected issues after cutover, the team can move back to blue. The rollback path is part of the design, not an afterthought.
- Reverse: replicate green back to blue.: After cutover, replication is reversed: green is now primary, blue receives changes. The reverse replication keeps blue current; if rollback is needed, blue is ready.
- Flip back if needed.: The same procedure that flipped to green flips back to blue. The team has practiced both directions; either is operationally feasible.
- Window of risk: between drain and flip.: The brief moment when writes are paused is when an issue would be most disruptive. The window is short by design; preparation minimizes it further.
- Keep it short.: The drain-to-flip window is the operational risk. Tight procedures, automation, and rehearsed cutover all keep it short. A window measured in seconds is the goal; minutes is acceptable; longer is poor execution.
- Eventual decommission.: After enough time has passed without issues, blue is decommissioned. The cost of running both ends. The team's retention period for blue depends on confidence in green; days to weeks is typical.
Blue/green database pattern is one of the most powerful techniques for safe database migrations. Nova AI Ops integrates with database telemetry, surfaces replication lag and cutover health, and produces the operational visibility teams need to execute these migrations confidently.