Database Sharding: When and How
Sharding is operationally expensive. Pick when forced; never for premature optimization.
When to shard
Read replicas first; vertical scale first; only shard when single-write-node is the bottleneck.
Most teams shard too early; the operational cost is high.
Four sharding patterns
- 1. Hash-based, even distribution.
- 2. Range-based, ordered queries efficient.
- 3. Geography-based, data residency.
- 4. Tenant-based, multi-tenant SaaS.
Migration cost
Migration: 6-12 months; involves application changes; backfill at scale.
Worth it only if scale demands it; not preemptively.
Application changes
Application: shard-key in every query; cross-shard queries hard; transactions across shards painful.
Tooling (Vitess, Citus) reduces but does not eliminate the pain.
Antipatterns
- Premature sharding. Operational cost without scale benefit.
- No shard key in queries. Performance disaster.
- Cross-shard transactions as default. Distributed-transaction pain.
What to do this week
Three moves. (1) Apply this pattern to your most-loaded table. (2) Measure query latency / write throughput before/after. (3) Document the win and the constraint so the next refactor inherits the knowledge.