Cloud & Infrastructure Intermediate By Samson Tanimawo, PhD Published Dec 6, 2026 10 min read

Spot Instances at Scale: When the Savings Are Real

Spot pricing is genuinely amazing for the right workloads. The wrong workload makes the savings vanish into operational toil.

Why spot is so cheap

Cloud providers sell unused capacity at deep discount with the right to take it back on 2 minutes notice. The discount is real (60-90% off on-demand). The catch is the interruption.

For workloads that tolerate interruption, the savings are pure. For workloads that do not, spot creates expensive incidents.

Workloads where spot wins

Workloads where spot is a trap

Stateful databases. Two minutes is not enough to drain a primary safely.

Long-running jobs without checkpoints. 6-hour ML training that must restart from scratch on interrupt.

Anything with a strict SLO. Even diversified, spot has tail interruption events.

Diversification math

Diversify across instance families and AZs. The probability of all spot pools being interrupted simultaneously is the product of individual probabilities, very low when diversified.

Modern tooling (Karpenter, AWS Compute Optimizer) handles diversification automatically; the manual era is over.

Antipatterns

What to do this week

Three moves. (1) Pick the most exposed instance of the pattern in your environment. (2) Apply the lightest fix and measure for one week. (3) Schedule a quarterly review so the discipline does not rot.