MongoDB Sharding
Choose shard key.
Overview
MongoDB sharding distributes collections across shards for horizontal scale. The shard key choice is the most important decision in any sharded MongoDB deployment: the wrong shard key produces hot shards, jumbo chunks, and queries that fan out across every shard. The right shard key has high cardinality, even distribution under expected access patterns, and matches the queries the application actually runs.
- Choose shard key. The most important decision; high cardinality, even distribution, frequently used in queries.
- Hashed vs ranged. Hashed produces even distribution but breaks range queries; ranged supports range queries but risks hot shards.
- Chunk balancing. Mongo balances chunks automatically; the balancer keeps distribution even as data grows.
- Mongos routers plus config servers. Application connects to mongos (not directly to shards); config server replica set stores chunk metadata.
The approach
The practical approach is to plan the shard key against actual query patterns before sharding (the wrong choice locks in expensive migration), prefer hashed for write-heavy workloads where even distribution matters more than range queries, monitor chunk balance continuously to catch imbalance, watch for jumbo chunks (chunks that cannot split, indicating a shard-key cardinality problem), and document the per-collection shard-key rationale committed to the schema documentation.
- Plan the shard key. High cardinality, even distribution, frequently used in queries; the choice locks in for the collection’s lifetime.
- Hashed for even distribution. Default for write-heavy workloads; trades range query support for even distribution.
- Monitor balance. Per-shard chunk count and data size; imbalance surfaces hot shards before query latency does.
- Avoid jumbo chunks plus documented choice. Jumbo chunks signal shard-key cardinality issues; per-collection shard-key rationale committed to the schema documentation.
Why this compounds
Sharding mastery compounds across collections. Each correct shard key supports linear growth; each query that hits a single shard runs at unsharded speed; the team builds intuition for shard-key design that pays off on every new sharded collection. Without the discipline, hot shards and jumbo chunks become recurring incidents.
- Scale. Right shard key supports linear growth; the cluster scales with data volume rather than choking on hot shards.
- Query performance. Targeted queries hit single shards; mongos routes to the right shard rather than fanning out across all.
- Operational maturity. Each cluster operated grows expertise; the team learns the failure modes that only sharded clusters exhibit.
- Institutional knowledge. Each rebalance teaches MongoDB internals; the team builds vocabulary for sharded-database operation.
MongoDB sharding is an operational discipline that pays off across years. Nova AI Ops integrates with database telemetry, surfaces shard patterns, and supports the team’s database engineering discipline.