Spot Fleet Diversification
Diversify spot to avoid interruption.
Overview
Spot fleet diversification spreads capacity across instance types and availability zones so the loss of any one pool does not collapse the workload. Without diversification, spot savings come with the recurring cost of mass interruption.
- Spread across instance types. A fleet limited to one family interrupts together when capacity tightens. Mix at least 4 to 6 compatible types.
- Spread across AZs. Spot pools are per-AZ. Spanning 3 AZs in a region is the floor, not the ceiling.
- Capacity-optimized allocation. The capacity-optimized strategy chooses pools with the deepest available capacity rather than the cheapest, which trades a small price premium for a much lower interruption rate.
- Quarterly review. Spot price and interruption patterns change. A quarterly review catches drift before it becomes outage shape.
The approach
The practical approach pairs diversification across two axes with a documented policy that explains the trade-off so the next operator does not undo the design.
- Per-fleet instance mix. Define the eligible instance types in code. CI rejects fleet definitions with fewer than 4 types unless an exception is granted.
- AZ spread. The fleet spans every AZ that hosts the workload. Single-AZ fleets are a deliberate exception, not the default.
- Capacity-optimized strategy. Default the allocation strategy to capacity-optimized. Lowest-price is for short-lived batch jobs only.
- Documented rationale. The fleet config carries a comment explaining the diversification decision. Future operators inherit context, not just configuration.
Why this compounds
Each well-diversified fleet teaches the team a little more about how spot capacity behaves. The institutional knowledge survives the engineer who set it up.
- Reliability with savings. Diversification keeps the 70 to 90 percent cost savings spot offers while reducing interruption rate by an order of magnitude.
- Workload-matched strategy. Different workloads need different mixes. The team learns which workloads tolerate cheap-pool churn and which need capacity-optimized.
- Cross-team reuse. A diversified fleet template becomes the starting point for the next service. The first fleet is investment; the next ones are routine.
- Quarterly compounding. Each review tunes the mix. After a year, the fleet definitions are mature and the interruption rate drops to background noise.