EBS Volume Rightsizing Discipline

Most EBS volumes are oversized. The audit that catches it and the savings that follow.

The audit

The volume rightsizing audit collects three metrics. Per-EBS storage utilisation (used bytes vs provisioned), per-EBS IOPS (provisioned vs consumed), and a top-N list sorted by absolute waste. Top 10 volumes typically account for 60-80% of optimisation potential, which is where the focused work pays back.

Per-EBS storage utilisation. Used bytes vs provisioned via CloudWatch agent; the basic waste indicator.
Per-EBS IOPS. Provisioned vs consumed; gp3 decouples IOPS from size, gp2 couples them; many teams overprovision both.
Top-N waste list. Sorted by absolute waste (provisioned minus consumed); top 10 account for 60-80% of optimisation potential.
Per-volume audit deliverable. Documented utilisation and IOPS shape; supports rightsizing decisions.

Right-sizing down

Right-sizing down has rules. Storage utilisation under 50% sustained for 30 days is a candidate; EBS shrinking is operationally risky so most teams replace at instance refresh rather than online; IOPS utilisation under provisioned means dropping the IOPS provision because gp3’s 3000 baseline is the default.

Under 50% for 30 days. Candidate for resize down; the sustained utilisation rule.
Shrink at refresh. Online shrinking is operationally risky; replace at instance refresh during the next deploy.
Drop IOPS provision. Under-consumed IOPS get dropped; gp3 baseline 3000 IOPS is the default.
Per-volume rightsize plan. The plan documented per volume; supports staged execution across the fleet.

Upgrade gp2 to gp3

gp3 is universally better than gp2 for new workloads: same storage, higher baseline performance, lower cost per GB. Migration is a single modify-volume API call per existing volume, online with no downtime; performance often improves and cost drops 20%. The reason teams haven’t done it is that it requires explicit action.

gp3 universally better. Same storage, higher baseline performance, lower cost per GB.
Single API call migration. modify-volume; online, no downtime; the migration is mechanically trivial.
20% cost drop. Performance often improves; the migration pays back immediately.
Per-fleet migration tracker. Documented gp2-to-gp3 progress; supports completing the migration at fleet scale.

Typical savings

First-pass audits find 30-50% of EBS spend reducible without performance loss, mostly from oversized provisions and unnecessarily-high IOPS. Recurring quarterly audits find another 5-10% each time as workloads grow and oversizing creeps back; automation tools generate recommendations that engineers approve and apply.

30-50% first-pass. Reducible without performance loss; mostly oversized provisions and unnecessarily-high IOPS.
5-10% recurring quarterly. Workloads grow; oversizing creeps back; the audit is a recurring discipline.
Automation tools. AWS Compute Optimizer, third-party FinOps platforms generate recommendations; engineers approve and apply.
Per-quarter savings tracking. Documented savings per cycle; supports continued investment in the discipline.

What to watch out for

Three risks deserve attention. Burst credit dynamics on gp2 (right-sizing can eliminate burst capacity needed during peaks); snapshot lineage (volume deletion after replacement should not orphan snapshots); filesystem expansion is one-way easy and contraction is hard, so plan for growth and right-size conservatively rather than aiming for 100% utilisation.

Burst credit dynamics. gp2 uses I/O credits; right-sizing storage may eliminate burst capacity needed during peaks.
Snapshot lineage. Volume deletion after replacement should not orphan snapshot ancestry; verify before deletion.
Expansion easy, contraction hard. Filesystem expansion is one-way easy; plan for growth, right-size conservatively.
Per-rightsize verification. Each rightsize verified for performance hold; supports safe execution.