EBS Volume Rightsizing Discipline
Most EBS volumes are oversized. The audit that catches it and the savings that follow.
The audit
The volume rightsizing audit collects three metrics. Per-EBS storage utilisation (used bytes vs provisioned), per-EBS IOPS (provisioned vs consumed), and a top-N list sorted by absolute waste. Top 10 volumes typically account for 60-80% of optimisation potential, which is where the focused work pays back.
- Per-EBS storage utilisation. Used bytes vs provisioned via CloudWatch agent; the basic waste indicator.
- Per-EBS IOPS. Provisioned vs consumed; gp3 decouples IOPS from size, gp2 couples them; many teams overprovision both.
- Top-N waste list. Sorted by absolute waste (provisioned minus consumed); top 10 account for 60-80% of optimisation potential.
- Per-volume audit deliverable. Documented utilisation and IOPS shape; supports rightsizing decisions.
Right-sizing down
Right-sizing down has rules. Storage utilisation under 50% sustained for 30 days is a candidate; EBS shrinking is operationally risky so most teams replace at instance refresh rather than online; IOPS utilisation under provisioned means dropping the IOPS provision because gp3’s 3000 baseline is the default.
- Under 50% for 30 days. Candidate for resize down; the sustained utilisation rule.
- Shrink at refresh. Online shrinking is operationally risky; replace at instance refresh during the next deploy.
- Drop IOPS provision. Under-consumed IOPS get dropped; gp3 baseline 3000 IOPS is the default.
- Per-volume rightsize plan. The plan documented per volume; supports staged execution across the fleet.
Upgrade gp2 to gp3
gp3 is universally better than gp2 for new workloads: same storage, higher baseline performance, lower cost per GB. Migration is a single modify-volume API call per existing volume, online with no downtime; performance often improves and cost drops 20%. The reason teams haven’t done it is that it requires explicit action.
- gp3 universally better. Same storage, higher baseline performance, lower cost per GB.
- Single API call migration.
modify-volume; online, no downtime; the migration is mechanically trivial. - 20% cost drop. Performance often improves; the migration pays back immediately.
- Per-fleet migration tracker. Documented gp2-to-gp3 progress; supports completing the migration at fleet scale.
Typical savings
First-pass audits find 30-50% of EBS spend reducible without performance loss, mostly from oversized provisions and unnecessarily-high IOPS. Recurring quarterly audits find another 5-10% each time as workloads grow and oversizing creeps back; automation tools generate recommendations that engineers approve and apply.
- 30-50% first-pass. Reducible without performance loss; mostly oversized provisions and unnecessarily-high IOPS.
- 5-10% recurring quarterly. Workloads grow; oversizing creeps back; the audit is a recurring discipline.
- Automation tools. AWS Compute Optimizer, third-party FinOps platforms generate recommendations; engineers approve and apply.
- Per-quarter savings tracking. Documented savings per cycle; supports continued investment in the discipline.
What to watch out for
Three risks deserve attention. Burst credit dynamics on gp2 (right-sizing can eliminate burst capacity needed during peaks); snapshot lineage (volume deletion after replacement should not orphan snapshots); filesystem expansion is one-way easy and contraction is hard, so plan for growth and right-size conservatively rather than aiming for 100% utilisation.
- Burst credit dynamics. gp2 uses I/O credits; right-sizing storage may eliminate burst capacity needed during peaks.
- Snapshot lineage. Volume deletion after replacement should not orphan snapshot ancestry; verify before deletion.
- Expansion easy, contraction hard. Filesystem expansion is one-way easy; plan for growth, right-size conservatively.
- Per-rightsize verification. Each rightsize verified for performance hold; supports safe execution.