Kubernetes Multi-Tenancy Patterns
Namespace-as-tenant, virtual cluster, full cluster-per-tenant. Three isolation models with very different cost and blast-radius trade-offs, pick the wrong one and you’ll regret it within a quarter.
Why multi-tenancy is hard
Kubernetes was designed for trusted multi-tenancy, multiple teams in one company sharing a cluster. It was not designed for hostile multi-tenancy, multiple paying customers running unknown workloads. The gap between “teams in the same org” and “tenants who might be malicious” is where most pain lives.
The hard truth. Two pods on the same node share a kernel. Container escapes are rare but not zero. If your tenants are external customers running their own code, namespace-level isolation isn’t enough, a kernel exploit gives one tenant access to everything on the node.
The other hard truth. Cluster-per-tenant is expensive. Control plane costs ($75/month minimum on EKS, similar elsewhere), node-pool minimums, observability sprawl. At 100 tenants, you’re running 100 control planes, the operational cost is more than the compute cost.
The whole spectrum below is about choosing where on the cost-vs-isolation curve your team should sit.
Namespace-as-tenant
The simplest pattern. Each tenant gets a namespace; RBAC restricts them to their namespace; ResourceQuotas cap their usage; NetworkPolicies isolate their pods.
The strengths. Cheap, one cluster, many namespaces, shared control plane. Operationally simple, the same monitoring stack covers everyone. Familiar, every Kubernetes admin knows namespaces.
The weak spot. Soft isolation. Tenants share the API server, the control plane, the kernel on each node. A noisy tenant can degrade the API server for everyone. A kernel exploit gives one tenant access to all others. CRDs are cluster-scoped, if one tenant installs a CRD, every tenant sees it.
Where it works. Internal teams that mostly trust each other; SaaS apps where tenants are rows in a database, not separate compute environments; dev/staging environments. The pattern that fits 80% of multi-team companies.
The hardening. Pod Security Standards (restricted profile by default), NetworkPolicy default-deny, ResourceQuota on every namespace, LimitRange to prevent the “no-limits” pod. With those four, namespace-tenancy is solid for trusted-team use.
Virtual cluster
The middle ground. Tools like vCluster or Kamaji run a separate Kubernetes API server inside a host cluster, giving each tenant the appearance of their own cluster while sharing nodes underneath.
The strengths. Each tenant gets their own API surface, their own CRDs, their own RBAC tree, their own admission controllers. The host cluster is the “node pool”; the virtual clusters are independent control planes. Tenants can do things that would break shared-namespace tenancy, install operators, define cluster-scoped resources, run their own admission webhooks.
The weak spot. The kernel is still shared. Workloads still run on host nodes; container escapes are still a concern. The virtual cluster gives API isolation, not workload isolation.
Where it works. Internal platform teams that need to give product teams “cluster-feeling” access without the cost of a real cluster; companies with strict CRD-isolation needs; dev environments where each engineer wants their own “cluster”.
The cost. The virtual control planes consume memory and CPU on the host; budget 200-500MB per virtual cluster. At 50 tenants, that’s a non-trivial slice of a single host cluster, but still cheaper than 50 real clusters.
Cluster-per-tenant
Hard isolation. Each tenant gets their own cluster, their own control plane, their own nodes, their own everything.
The strengths. The blast radius of a tenant breach is exactly that tenant’s cluster. Kernel exploits don’t cross. Compliance is easier to argue (one cluster, one tenant’s data). Tenants get true control plane isolation, their CRDs, their admission webhooks, their etcd.
The weak spot. Cost and operations. EKS control plane is $75/month/cluster. Node pool minimums add another $100-200/month/cluster. Observability stack per cluster is another $50-100. Conservative all-in: $250/month/cluster minimum. At 100 tenants, $25k/month before any actual workload.
Where it works. Tenants who pay enough to justify it (enterprise SaaS, regulated industries); tenants who genuinely run hostile workloads; air-gapped or sovereign-cloud requirements. Not for free-tier or low-paying tiers.
The operations story. The killer is observability and upgrades. 100 clusters means 100 upgrade windows; the team needs heavy automation to survive. Tools like Cluster API and Crossplane are the table stakes; without them, cluster-per-tenant doesn’t scale past 20-30.
The cost-vs-blast-radius math
The simplest framing: how much does each tenant pay you, and how big is the cost-of-isolation per tenant?
If tenants pay $0-100/month, you cannot afford cluster-per-tenant. Namespace-as-tenant or virtual cluster are the options.
If tenants pay $100-1000/month, virtual cluster is the sweet spot. The marginal cost of a vCluster is low; the API isolation is meaningful; the ops scale to hundreds.
If tenants pay $1000+/month, cluster-per-tenant becomes affordable. The $250/month/cluster fixed cost is <25% of revenue; the isolation story is real; the compliance story is real.
The other axis: blast-radius cost. If a single tenant breach would cost the company $1M+ in incident response and reputation, the math shifts. Even at $100/tenant revenue, it might be worth running them in their own cluster, you’re not optimising revenue, you’re optimising downside.
The hybrid pattern
Most mature platforms run multiple isolation tiers.
The setup. Basic tier in a shared cluster (namespace-as-tenant). Paid tier in a virtual cluster (vCluster on a shared node pool). Enterprise tier in a dedicated cluster. Tenants graduate as they pay more; the pricing reflects the cost-of-isolation.
The marketing pitch. “Isolation level” becomes a feature. Tenants who care will pay for the dedicated tier; tenants who don’t stay in shared. Both paths have a clear upgrade path.
The operations. Each tier has its own runbook; the platform team supports three modes. The complexity is real but bounded; you’re not running 1000 unique configurations, you’re running three tiers with N tenants in each.
What to do this week
Three moves. (1) For your current tenancy model, write down the failure modes. “If tenant A’s pod escapes, what does it touch?” Most teams haven’t actually answered this. (2) Cost the next tier up, if you’re on namespace-as-tenant, what does virtual cluster cost? Often surprisingly little; the migration is the cost, not the runtime. (3) If you’re running cluster-per-tenant for everyone, ask whether you should be running multi-tier; the bottom-tier tenants are subsidising operations cost they don’t need.