Read Amplification

Index design impact.

Overview

Read amplification recognises that one logical read can produce many physical reads, and physical reads are what drive performance and IOPS cost. A query that reads "one row" might actually touch the index, fetch from heap, walk through dead tuples (Postgres), or scan multiple SST files (LSM-tree storage). The discipline is in measuring per-query physical reads, designing indexes that minimise heap fetches, and using covering indexes for read-heavy paths.

Index design impact. Per-query the index path; index-only scans avoid the heap fetch entirely.
LSM-tree storage. Per-key the read may touch multiple SST files; compaction tuning determines how many.
Heap-only tuples. Per-row the heap lookup; Postgres dead tuples (unvacuumed) amplify reads.
Index-only scans plus per-query measurement. Covering indexes (INCLUDE columns) avoid heap entirely; per-query physical-read measurement reveals which queries amplify.

The approach

The practical approach is to measure per-query physical reads (pg_stat_statements, EXPLAIN BUFFERS, equivalent in other engines), design covering indexes for read-heavy paths so queries hit only the index, monitor buffer cache hit rate as the leading signal of amplification pressure, run VACUUM aggressively on Postgres tables where dead tuples accumulate, and document the per-table read strategy so the design is reviewable.

Measure physical reads. Per-query the buffer hits and disk reads; the data anchors the optimization conversation.
Covering indexes. INCLUDE columns avoid the heap fetch; queries hit only the index, never the table.
Monitor buffer cache. Per-database the cache hit rate; below threshold means physical reads are dominating.
VACUUM discipline plus documented strategy. Aggressive VACUUM on high-write tables reduces dead tuples; per-table read pattern committed for operational review.

Why this compounds

Read amplification discipline compounds across queries and tables. Each measured query reveals real physical IO; each covering index reduces ongoing IOPS for the queries it serves; the team builds intuition for which queries amplify and which scan-free. Without the discipline, slow-query investigations focus on logical query shape and miss the physical-IO patterns that actually drive cost and latency.

Query performance. Reduced amplification produces fast queries; the user-facing latency drops where the index covers.
Cost efficiency. Less physical IO produces lower IOPS bill; the cost tracks logical query count rather than amplification factor.
Resource utilization. Buffer cache reuse improves; hot data stays in memory rather than churning through cache.
Institutional knowledge. Each measurement teaches database internals; the team learns where amplification hides.

Read amplification discipline is a database discipline that pays off across years. Nova AI Ops integrates with database telemetry, surfaces amplification patterns, and supports the team’s database engineering discipline.