Write Amplification
SSD considerations.
What it is
Write amplification is the ratio of bytes written to physical storage per byte of application data. It is almost always higher than 1 because filesystem journals, database WALs, indexes, and replication each multiply the write. SSDs care because high amplification shortens lifespan; modern wear-levelling helps, but total write volume still drives endurance.
- Higher than 1 by design. Filesystem journal, database WAL, replication factor each add to the total. The number is a multiplier, not a constant.
- SSDs care. Lifetime cost per write. High amplification shortens SSD lifespan measurably.
- Wear levelling helps. Modern SSDs even out writes across cells. But the OS and filesystem still control total write volume.
- Amplification factor per workload. Actual measured ratio. The discipline starts with knowing the number.
Sources
Three sources dominate the amplification budget. Each is fixable; each carries its own trade-off. Knowing which source dominates tells the team where to invest.
- Filesystem journals. Writes happen twice: journal first, then the actual file. Journal mode (data, ordered, writeback) controls the cost.
- Database WALs. Write-ahead log first, then the data file. WAL config controls fsync behaviour and group commit.
- Replication factor. Each write goes to N replicas. RF directly multiplies the total.
- Index maintenance. Each index update on UPDATE. Indexes matter for read performance; they cost writes.
Measuring
You cannot reduce what you have not measured. The amplification factor is a ratio between two numbers you can already get from standard tooling: device-level bytes written and application-level logical writes.
iostator per-device metrics. Total bytes written per device. The OS-level number.- Application metrics. Bytes accepted from clients per app. The logical write rate.
- Ratio equals amplification factor. Physical divided by logical, per workload. The discipline produces a real number.
- Workload baseline. Historical baseline per workload. Drift surfaces as a change in the ratio.
Reducing amplification
Reducing amplification is mostly about cutting the multipliers: larger batches, compression, right filesystem for the workload, and per-database tuning of the WAL settings that drive most of the cost.
- Larger batch sizes. Larger batches amortise per-write overhead. Throughput rises; amplification falls.
- Compression. Compressed payloads reduce both logical and physical writes. The CPU cost is usually worth it.
- Modern filesystems. ZFS and btrfs handle amplification differently from ext4. Pick the filesystem that matches the workload.
- Per-database tuning. WAL settings, fsync mode, group commit. Database-specific tuning often dominates filesystem-level choices.