K8s Pod Security Hardening Checklist
Default pod settings are too permissive. The hardening.
Must have
The default Kubernetes pod has more privileges than most workloads need. Running as root, writable root filesystem, no sandbox, full Linux capabilities. Each of these is a privilege the workload almost certainly does not need but which an attacker can exploit if they compromise the workload. Pod security hardening tightens the defaults to match what real workloads actually need, which is much less.
What every production pod must have:
- runAsNonRoot: true.: The container runs as a non-root user. The image must define a non-root user; the security context enforces it. Compromise of a non-root container is much less serious than compromise of a root one because most kernel exploits and capability escalations require root inside the container.
- readOnlyRootFilesystem: true.: The container's root filesystem is mounted read-only. The application can write to specific volumes (mounted at known paths) but not to arbitrary filesystem locations. An attacker who compromises the container cannot drop new binaries into /tmp, /usr/local/bin, or anywhere else on the root fs.
- allowPrivilegeEscalation: false.: The no_new_privs Linux flag is set. The container cannot gain new privileges via setuid binaries, capabilities, or related mechanisms. Even if there is a setuid root binary in the image, it cannot escalate.
- Standard hardening, not heroic.: These three settings are the floor. They are zero-cost to apply on workloads that do not need elevated privileges (which is most workloads). They are also the easiest controls to enforce via admission policy. Kyverno, OPA Gatekeeper, or the built-in PodSecurity admission can require all three on every pod.
- Apply via Pod Security Standard "restricted".: Kubernetes ships three Pod Security Standards: privileged, baseline, restricted. Restricted requires all three of the above plus a few more. Setting the namespace to restricted enforces them automatically; deviations require explicit per-pod overrides.
The "must have" list is non-negotiable for production workloads. The cost is a few YAML lines per pod; the protection is dramatic.
Should have
The next tier of hardening is the "should have" list. These add defense in depth on top of the must-have baseline. Most workloads can adopt them with modest configuration; the security gain is significant.
- seccompProfile: RuntimeDefault.: Seccomp is the kernel mechanism for filtering syscalls. The RuntimeDefault profile (provided by the container runtime) blocks the syscalls that container workloads almost never legitimately need. An attacker inside the container has fewer syscall paths available, which closes many kernel-exploit options.
- Drop all Linux capabilities, add only what you need.: Capabilities ALL is dropped, then specific ones (NET_BIND_SERVICE for binding to ports below 1024, for example) are added back. Most workloads need zero capabilities. Containers inheriting the default capability set are over-privileged by orders of magnitude.
- Use distroless or minimal base images.: The base image determines what binaries an attacker has available after compromise. Distroless images (gcr.io/distroless) include only the application's runtime; no shell, no package manager, no debugging tools. An attacker who breaks in finds nothing useful.
- Resource limits set.: CPU and memory limits prevent a compromised container from consuming the node's resources. The blast radius is bounded; the cluster does not collapse if one container goes off-script.
- Network policies applied.: Default-deny egress means a compromised container cannot reach the internet. Default-deny ingress means it cannot be reached by other pods that should not talk to it. Network policies are the lateral-movement defense.
The should-have list is where the team's security maturity shows. Adopting these takes a few quarters of incremental work; the cumulative protection is large.
Avoid
The third list is what to avoid. These are the privilege escalations that should be rare and deliberate rather than routine. Workloads using them need a specific justification and a specific scope.
- privileged: true.: A privileged container has full access to the host. It can mount any filesystem, load kernel modules, escape the container easily. The legitimate uses are rare (some monitoring agents, GPU drivers in some configurations) and should be carefully scoped to specific pods, not used as a default.
- hostNetwork: true.: The container shares the host's network namespace. It can bind to host ports directly, see host network interfaces, observe traffic intended for other containers. Legitimate use cases are narrow (network plugins, some metrics collectors); most workloads do not need this.
- hostPID: true.: The container shares the host's process namespace. It can see and signal processes outside its own container, including the host's processes. Legitimate uses are extremely rare; the privilege almost never matches what the workload actually needs.
- hostPath volumes.: Mounting a path from the host into the container couples the container to specific host filesystem layout and provides escape paths to the host. Most workloads should use ephemeral volumes or persistent volumes managed by the cluster, not direct hostPath.
- Specific use cases only.: When one of these privileges is genuinely needed, the pod is annotated with a justification, scoped to specific namespaces, and reviewed periodically. The privilege is the exception, not the routine.
Pod security hardening reduces the blast radius of any compromise from "the cluster" to "the specific pod." Nova AI Ops audits pod specs against the must-have, should-have, and avoid lists, surfaces the deviation patterns, and tracks the cluster's security posture over time so the team can see whether the hardening discipline is improving or eroding.