Kubernetes RBAC Best Practices 2026
Roles vs ClusterRoles, groups vs ServiceAccounts, policy-as-code that survives every team reorg. The RBAC habits that scale past 50 engineers.
Why RBAC drift kills you
RBAC drift is the silent failure mode of every multi-team cluster. Year one, you have three roles and they make sense. Year three, you have 200 roles, half of them granting cluster-admin for “debugging”, and no one remembers who needed what. The first audit reveals everyone has every permission, and undoing it is a six-month project.
The way it happens. A new team needs access; an SRE clones the closest existing role; broadens it “just in case”; ships. Repeat 200 times. Each step looks reasonable; the aggregate is a flat-permission surface where any compromised pod has the run of the cluster.
The cost when it breaks. A compromised pod with broad permissions can list every secret, exec into every container, modify every workload. The blast radius of a breach is the union of every overly-broad role, in most clusters that’s effectively the whole environment.
Roles vs ClusterRoles
Two scopes; pick the smallest one that works.
Role is namespaced. The permissions only apply inside the namespace where the Role lives. A Role granting get on Pods in team-a can’t list pods in team-b. This is the right answer for most application-level access.
ClusterRole is cluster-scoped. The permissions apply across all namespaces (when bound with ClusterRoleBinding) or in a specific namespace (when bound with RoleBinding). Use ClusterRoles for things that are inherently cluster-wide, node access, CRD management, namespace creation.
The pattern. Default to Role. Reach for ClusterRole only when you genuinely need to grant access across namespaces, or when the resource is cluster-scoped (Nodes, PersistentVolumes, ClusterRoles themselves). The 80/20: most application access should be Roles, not ClusterRoles.
The reusable-ClusterRole trick. Define a ClusterRole once (e.g. app-developer) and bind it with RoleBindings into many namespaces. The ClusterRole is the policy; the RoleBinding is the scoping. Lets you write the policy once and apply it consistently across teams.
Groups vs ServiceAccounts
Two principals; very different lifecycles.
Groups are humans. Bind to OIDC groups from your identity provider; the group membership comes from your IdP, not from Kubernetes. A user joining team-a in your IdP automatically gets the team-a permissions; no kubectl required.
ServiceAccounts are workloads. Each pod runs as a ServiceAccount; the ServiceAccount has its own permissions; the pod inherits them. Default ServiceAccount has minimal permissions; named ServiceAccounts get specific ones.
The rule. Never bind a Role to a human user directly, always go through a group. The group abstraction means you don’t have to update RBAC when someone changes teams; the IdP handles it.
The pod-level rule. Each workload gets its own ServiceAccount with the minimum permissions it needs. Don’t share ServiceAccounts across unrelated workloads; the blast radius of a compromised pod is its ServiceAccount’s permissions.
The audit story. Group bindings show up in the IdP; ServiceAccount bindings show up in the cluster. Audit them differently: groups via IdP reports; ServiceAccounts via kubectl get rolebinding -A and policy review.
Policy-as-code patterns
RBAC YAML is just YAML; it belongs in a git repo, reviewed in PRs, applied by a controller. Three patterns work.
Pattern 1: namespace-per-team, RBAC-per-namespace. Each team gets a namespace; the RBAC for that namespace lives in teams/<name>/rbac.yaml. CODEOWNERS files require team approval to modify their own RBAC; cluster-admin approval for cross-namespace changes.
Pattern 2: GitOps with Argo CD or Flux. RBAC syncs from the repo to the cluster; drift detection alerts when someone kubectl applys outside of git. The cluster state is always the repo state; the audit log is the git log.
Pattern 3: OPA Gatekeeper or Kyverno guardrails. Policy admission controllers reject RBAC that violates rules, e.g. “no Role may grant secrets/* outside the secrets-team namespace”. Catches the broad-grant pattern before it lands.
The minimum-viable setup. RBAC YAML in git, two-person PR review, CI lint that flags cluster-admin grants and verb-glob *. That alone catches 90% of the drift pattern; the rest is gatekeeping at the policy controller.
Auditing the live cluster
Even with policy-as-code, the cluster will drift, emergencies, incident-time edits, dev experiments. Audit quarterly.
The basic query. kubectl get clusterrolebinding -o yaml, look for any subject that’s a real human (not a ServiceAccount or system component) bound to cluster-admin. Anyone in that list should be justified or removed.
The deeper query. For each ServiceAccount, list its effective permissions (use kubectl-who-can or rakkess). Sort by permission breadth; the top 10 broadest ServiceAccounts are the highest-blast-radius targets in a breach. Tighten them.
The drift detection. Diff the live cluster RBAC against the git repo. Anything in the cluster that isn’t in git is drift; either bring it into git or remove it. Most clusters have 5-20% drift after a year; the cleanup is a one-day project per quarter.
Antipatterns
Wildcards on verbs or resources. verbs: ["*"] or resources: ["*"] in any Role except cluster-admin is almost always a mistake. Be explicit about what’s allowed.
cluster-admin for “just debugging.” Debugging is a use case for break-glass, a separately-audited, time-bounded role, not a permanent grant. The break-glass role logs every use; the team gets justified once a quarter.
Sharing ServiceAccounts across pods. Each pod should have its own SA. The default-namespace-default-SA pattern in production is a smell, every pod has the same permissions, which means the breadth has to cover the union of all pod needs.
Manual RoleBinding edits in production. Every kubectl edit rolebinding in prod is a small audit failure. If you have to edit live RBAC, do it in git, push, let the GitOps controller apply.
What to do this week
Three moves. (1) Run kubectl get clusterrolebinding and find every human user bound to cluster-admin. Justify or remove. (2) Move RBAC into git if it isn’t already, even without GitOps, having a repo with two-person review changes the culture. (3) Add a CI check that rejects verbs: ["*"] in any new Role unless the file is on an explicit allowlist. The combination of those three locks in 80% of the long-term hygiene.