Cluster Naming Convention

Cluster names should be predictable.

Why naming matters

At ten clusters, naming feels like a non-issue. At fifty, ad-hoc names like cluster-prod-2 and main-east become a productivity tax because engineers grep, lookup, and second-guess constantly. Naming encodes context (environment, region, purpose, ownership at a glance) and consistency aids automation (CI scripts that target prod-* match exactly the prod clusters).

Productivity tax at scale. 50+ clusters with ad-hoc names; engineers grep and second-guess.
Encodes context at a glance. Environment, region, purpose, ownership; the name tells the story.
Aids automation. CI prod-* match exactly; without convention, matches are fragile and dangerous.
Per-incident triage faster. On-call reads name, knows which dashboards to open; the discipline pays.

The pattern

The pattern is {env}-{region}-{purpose}-{n}. Examples: prod-us-east-1-app-1, staging-eu-west-1-batch-1, dev-shared-1. env is the environment (prod, staging, qa, dev); region matches the cloud provider’s region label exactly; purpose is one word from a small enumerated set (app, batch, ml, data); number suffix allows capacity expansion.

Format. {env}-{region}-{purpose}-{n}; predictable structure.
env values. prod, staging, qa, dev; small enumerated set.
Region matches cloud label. Cloud provider region label exactly; supports automation.
Purpose enumerated. app, batch, ml, data; small set keeps the convention readable.

Beyond the name: tags

Tags carry the metadata the name cannot fit. team, owner, contact, cost-center, expiry are queryable in cloud APIs; naming convention plus tagging convention is the full story (the name is the primary key, tags are the metadata); IaC enforces both with the Terraform module rejecting launches without proper name and required tags and CI failing the PR if missing.

Tag metadata. team, owner, contact, cost-center, expiry; queryable in cloud APIs.
Name plus tags = full story. Name is primary key; tags are metadata.
IaC enforcement. Terraform module rejects missing name or tags; CI fails PR.
Per-tag policy. Each required tag has a documented policy; supports consistent enforcement.

Migration strategy

Existing clusters get renamed at next replacement; forcing immediate renames disrupts so treat the convention as the standard for new clusters. Document deviations explicitly (a cluster-old-prod-2 is allowed with a written exception so nobody is confused); quarterly drift report flags clusters that don’t match the convention and owners explain or rename.

Rename at replacement. Existing clusters renamed at next replacement; immediate rename is too disruptive.
Convention for new clusters. The standard applies to new launches; old clusters migrate over time.
Documented deviations. Written exceptions for legitimately-named legacy clusters.
Quarterly drift report. Non-matching clusters flagged; owners explain or rename.

Scaling considerations

At scale, the convention itself needs structure. Above 100 clusters, even good conventions hit limits and you add suffix for finer-grained purpose (prod-us-east-1-checkout-1 splits from prod-us-east-1-app-1); cluster discovery becomes its own service where an internal tool maps purposes to cluster names; federation considerations matter because multi-cluster control planes (Karmada, Anthos) use cluster names as identifiers.

100+ cluster scale. Add finer-grained purpose suffix; checkout splits from generic app.
Cluster discovery as service. Internal tool maps purposes to names; engineers query the tool.
Federation matters. Karmada, Anthos use cluster names as identifiers; convention is critical.
Per-fleet naming review. Annual review at fleet scale; supports continued consistency.