Performance Intermediate By Samson Tanimawo, PhD Published Dec 17, 2026 10 min read

Capacity Planning Without a Crystal Ball: A Practical Framework

Capacity planning is not forecasting. It is knowing when you must act and what to do when you have to. The framework: four inputs, one decision rule, one quarterly review.

Why "predict next year's traffic" fails

Spreadsheet forecasts assume linearity that rarely holds. A new launch doubles traffic in a week. A churn event halves it overnight. The forecast is wrong by month three; the capacity plan is wrong with it. Building a model that predicts the future precisely is not the right problem.

The right problem: maintain enough headroom to absorb realistic spikes, and detect when you are eating into headroom faster than you can react. Capacity planning is a control system, not a forecasting exercise.

The four inputs that matter

1. Current peak utilization. p99 of your binding resource (often CPU; sometimes memory, sometimes IOPS) over the last 30 days.

2. Lead time to scale. How long from "we need more" to "we have more." Includes procurement, deploy, warm-up. For most cloud workloads, minutes; for some, weeks.

3. Plausible spike size. The biggest non-pathological surge, a successful launch, a viral mention, a holiday peak. Usually 2-3x normal.

4. Trend. Are you growing? Shrinking? Steady? The 90-day slope of utilization tells you which.

The headroom decision rule

The simple rule: maintain enough headroom that peak utilization × plausible spike size < 90%. If peak is 30% and spikes are 2x, that is 60%, fine. If peak is 50% and spikes are 2x, that is 100%, you are one spike away from full saturation.

The 10% buffer above 90% accounts for noise and partial-failure scenarios (a node restart that drops a fifth of capacity for 60 seconds). Tighter than 10% and the system has no recovery slack.

Scaling before traffic forces you

The trigger to act is when projected peak (current peak × growth rate over the next lead-time + plausible spike) crosses 90%. Scale before then. Reactive scaling, waiting until the alert fires, leaves no buffer for the lead time itself.

The mechanism. Autoscalers handle the minute-to-minute scaling. The capacity-planning question is "do I have enough max-capacity that the autoscaler can scale to." If max replicas is 50 and you project needing 60, raise the cap proactively.

The quarterly review. Every quarter, recalculate the four inputs and the headroom number for your top-10 services. Decisions to act go on the engineering roadmap; decisions to do nothing get documented (so the next quarter's review starts from data, not memory).

Antipatterns

Provisioning for the worst-case spike. Doubles cost; rare benefit. Provision for the plausible spike with a clear escalation path for the rare ones.

No capacity owner. Without an owner, the quarterly review skips itself. Pick one engineer per platform; rotate annually.

Forecasting headcount as a proxy for capacity. Headcount and traffic decouple constantly. Use traffic-driven capacity directly.

What to do this week

Three moves. (1) Calculate peak × spike for your top-3 services. (2) Identify any service over the 90% threshold; plan capacity for it before the next sprint. (3) Schedule the quarterly capacity review on the calendar with an owner, without that, the framework dies.