Databases Advanced By Samson Tanimawo, PhD Published Dec 20, 2026 12 min read

Postgres Index Strategy for High-Read SRE Workloads

A reliability engineer should not need to be a database expert to make the right index choice. Here is the decision tree that gets you 90% of the way there.

B-tree: the default that is usually right

B-tree is the default Postgres index. It supports equality and range queries on most data types, scales well, and is what you should reach for unless you have a specific reason not to. About 80% of indexes you create will be B-tree.

The use case. WHERE id = 42. WHERE created_at > '2026-01-01'. ORDER BY user_id. Anything where you query by value or range.

The composite gotcha. CREATE INDEX ON orders (user_id, status) works for WHERE user_id = X and for WHERE user_id = X AND status = Y but not for WHERE status = Y alone. Order matters.

GIN: when you query inside a value

GIN (Generalized Inverted Index) is for "find rows where this column contains X", full-text search, JSONB key lookups, array membership. B-tree cannot answer "does this JSONB column contain key X" efficiently; GIN can.

The cost. GIN indexes are larger and slower to update than B-tree. Acceptable for read-heavy workloads where the query type demands it; brutal for write-heavy ones.

Example. CREATE INDEX ON events USING gin (payload jsonb_path_ops). The jsonb_path_ops operator class is faster and smaller than the default for typical JSONB containment queries; use it unless you need other operators.

BRIN: when the data is naturally sorted

BRIN (Block Range INdex) stores summaries of physical blocks, not individual rows. It is tiny and fast to build. It works only when the column is correlated with physical row order, the canonical case is a timestamp on an append-only table.

The win. A BRIN index on a billion-row events table can be 100x smaller than B-tree while still pruning 99% of blocks for a date-range query. The tradeoff is no help for point lookups.

The trap. If your data gets reshuffled (large updates, vacuum-full, table rewrites), BRIN selectivity drops and queries get slow without warning. Re-summarise after big rewrites.

Partial: when most rows are irrelevant

CREATE INDEX ON orders (created_at) WHERE status = 'pending'. The index includes only pending orders. If 0.1% of orders are pending and you query mostly that subset, the partial index is dramatically smaller and faster than indexing the whole table.

The use case. Soft-deleted rows excluded. Status flags where one value dominates. Tenants in a multi-tenant table where one tenant is > 90% of rows and you want to query the others efficiently.

Covering: when you can avoid the heap

An index includes the columns it indexes. A covering index also includes additional columns so the query can be answered from the index alone, without visiting the table. CREATE INDEX ON orders (user_id) INCLUDE (status, total_cents).

The win. Index-only scans are 5-50x faster than index-then-heap for the same query because they skip a disk read per row. The tradeoff is index size.

The visibility-map gotcha. Index-only scans only work if Postgres knows all rows in the matching pages are visible, which depends on autovacuum keeping the visibility map fresh. On busy tables, tune autovacuum aggressively or the optimisation does not kick in.

EXPLAIN ANALYZE, the pattern

Three things to look for, in order. (1) Estimated vs actual rows: if estimate is off by 10x or more, the planner is making bad choices and statistics need updating. (2) Index Scan vs Seq Scan: if you expected an index and got a sequential scan, the planner thinks the table is too small or the index too costly. (3) Buffers: shared hits are good (cache); shared reads are slow (disk).

Always use EXPLAIN (ANALYZE, BUFFERS) in production-equivalent settings. The plan can change between dev and prod because data distribution changes; you have to test against representative data.

Antipatterns

Indexing every column. Each index slows writes. Most production databases have 30-40% indexes that are rarely used. Drop the unused ones, pg_stat_user_indexes.idx_scan is your friend.

Indexing low-cardinality columns alone. A B-tree on a boolean is rarely useful; the planner will sequential-scan anyway. Use partial indexes with the value filtered in the WHERE clause instead.

Forgetting about index bloat. Heavy update workloads bloat B-tree indexes over time. REINDEX CONCURRENTLY on a quarterly cadence keeps them tight; the alternative is gradual query-time creep.

What to do this week

Three moves. (1) Run SELECT * FROM pg_stat_user_indexes WHERE idx_scan = 0 and identify unused indexes for review. (2) Pick the slowest top-5 query from pg_stat_statements and EXPLAIN ANALYZE it; identify the index gap. (3) Schedule a quarterly REINDEX CONCURRENTLY for the three largest indexes on hot tables.