Database Sharding: When and How
Sharding is operationally expensive. Pick when forced; never for premature optimization.
When to shard
Sharding is the largest reversibility tax in database operations. Reach for it last, not first; the simpler scaling levers usually have more headroom than teams assume.
- Read replicas first. Read scaling is cheap; horizontal read replicas absorb most read-heavy workloads.
- Vertical scale first. Modern instance types reach 128 vCPU and 4TB RAM; many workloads never need sharding.
- Shard when forced. Single-write-node throughput is the bottleneck; replicas and bigger instances cannot help.
- Operational cost. Most teams shard too early and pay the operational cost without the scale benefit.
Four sharding patterns
- 1. Hash-based, even distribution.
- 2. Range-based, ordered queries efficient.
- 3. Geography-based, data residency.
- 4. Tenant-based, multi-tenant SaaS.
Migration cost
Sharding migrations measure in quarters, not weeks. Plan for the long path and stage the work so production stays usable throughout.
- Timeline. 6 to 12 months for a real workload; longer if the application has not been written with sharding in mind.
- Application changes. Every query path needs shard-key awareness; this is where most of the calendar goes.
- Backfill at scale. Existing data has to redistribute across shards; throttle the backfill to protect production.
- Forcing function. Only worth doing when scale demands it; never preemptively, and never as a status move.
Application changes
Sharding moves complexity from the database into the application. Plan for the shape of that complexity before you commit.
- Shard key in every query. Every read and write must carry the shard key; queries without it fan out to all shards.
- Cross-shard queries. Hard to do efficiently; pre-aggregate or denormalise to keep them rare.
- Cross-shard transactions. Distributed transactions are painful; redesign to avoid them where possible.
- Tooling helps. Vitess (MySQL) and Citus (Postgres) abstract some pain; they do not eliminate the cross-shard cases.
Antipatterns
- Premature sharding. Operational cost without scale benefit.
- No shard key in queries. Performance disaster.
- Cross-shard transactions as default. Distributed-transaction pain.
What to do this week
Three moves. (1) Apply this pattern to your most-loaded table. (2) Measure query latency / write throughput before/after. (3) Document the win and the constraint so the next refactor inherits the knowledge.