Cluster Naming Convention
Cluster names should be predictable.
Why naming matters
At ten clusters, naming feels like a non-issue. At fifty, ad-hoc names like cluster-prod-2 and main-east become a productivity tax because engineers grep, lookup, and second-guess constantly. Naming encodes context (environment, region, purpose, ownership at a glance) and consistency aids automation (CI scripts that target prod-* match exactly the prod clusters).
- Productivity tax at scale. 50+ clusters with ad-hoc names; engineers grep and second-guess.
- Encodes context at a glance. Environment, region, purpose, ownership; the name tells the story.
- Aids automation. CI
prod-*match exactly; without convention, matches are fragile and dangerous. - Per-incident triage faster. On-call reads name, knows which dashboards to open; the discipline pays.
The pattern
The pattern is {env}-{region}-{purpose}-{n}. Examples: prod-us-east-1-app-1, staging-eu-west-1-batch-1, dev-shared-1. env is the environment (prod, staging, qa, dev); region matches the cloud provider’s region label exactly; purpose is one word from a small enumerated set (app, batch, ml, data); number suffix allows capacity expansion.
- Format.
{env}-{region}-{purpose}-{n}; predictable structure. - env values. prod, staging, qa, dev; small enumerated set.
- Region matches cloud label. Cloud provider region label exactly; supports automation.
- Purpose enumerated. app, batch, ml, data; small set keeps the convention readable.
Beyond the name: tags
Tags carry the metadata the name cannot fit. team, owner, contact, cost-center, expiry are queryable in cloud APIs; naming convention plus tagging convention is the full story (the name is the primary key, tags are the metadata); IaC enforces both with the Terraform module rejecting launches without proper name and required tags and CI failing the PR if missing.
- Tag metadata. team, owner, contact, cost-center, expiry; queryable in cloud APIs.
- Name plus tags = full story. Name is primary key; tags are metadata.
- IaC enforcement. Terraform module rejects missing name or tags; CI fails PR.
- Per-tag policy. Each required tag has a documented policy; supports consistent enforcement.
Migration strategy
Existing clusters get renamed at next replacement; forcing immediate renames disrupts so treat the convention as the standard for new clusters. Document deviations explicitly (a cluster-old-prod-2 is allowed with a written exception so nobody is confused); quarterly drift report flags clusters that don’t match the convention and owners explain or rename.
- Rename at replacement. Existing clusters renamed at next replacement; immediate rename is too disruptive.
- Convention for new clusters. The standard applies to new launches; old clusters migrate over time.
- Documented deviations. Written exceptions for legitimately-named legacy clusters.
- Quarterly drift report. Non-matching clusters flagged; owners explain or rename.
Scaling considerations
At scale, the convention itself needs structure. Above 100 clusters, even good conventions hit limits and you add suffix for finer-grained purpose (prod-us-east-1-checkout-1 splits from prod-us-east-1-app-1); cluster discovery becomes its own service where an internal tool maps purposes to cluster names; federation considerations matter because multi-cluster control planes (Karmada, Anthos) use cluster names as identifiers.
- 100+ cluster scale. Add finer-grained purpose suffix;
checkoutsplits from genericapp. - Cluster discovery as service. Internal tool maps purposes to names; engineers query the tool.
- Federation matters. Karmada, Anthos use cluster names as identifiers; convention is critical.
- Per-fleet naming review. Annual review at fleet scale; supports continued consistency.