Kubernetes Network Policies: A Practical Guide for SRE Teams
By default, every pod in your Kubernetes cluster can talk to every other pod. That is fine on day one and a security nightmare on day 365. NetworkPolicies are how you fix it without breaking everything.
Why Network Policies Matter
Kubernetes ships with an open-by-default network. The compromise of any one pod gives the attacker lateral movement to every database, every internal API, every secrets endpoint in the cluster. Most production breaches in 2024-2026 followed this pattern: attacker gets RCE on a low-value web pod, pivots to the database pod, exfiltrates data. NetworkPolicies stop the lateral movement at the network layer.
The right mental model is microsegmentation. Each workload should only be able to talk to the specific dependencies it actually needs. Everything else is denied by default. The hard part is not writing the policies, the hard part is auditing your existing services to figure out who actually talks to whom.
How Network Policies Actually Work
A NetworkPolicy is a Kubernetes object that selects pods (via labels) and defines ingress (incoming) and egress (outgoing) rules. Once at least one NetworkPolicy selects a pod, that pod is "isolated", only traffic explicitly allowed by some policy is permitted, and everything else is dropped.
The critical mental model: NetworkPolicies are additive, not subtractive. You cannot write a "deny X" policy. Instead, you isolate the pod (by selecting it in any policy) and then explicitly allow what should work. Everything not explicitly allowed is denied.
The enforcement happens in your CNI plugin. Calico, Cilium, and Antrea all support NetworkPolicies natively. Flannel does not, which is the most common reason teams discover their policies have no effect, they wrote a perfect policy but their CNI is silently ignoring it.
The Deny-By-Default Pattern
The foundational policy every namespace should have:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: production
spec:
podSelector: {} # selects all pods in namespace
policyTypes:
- Ingress
- Egress
This policy selects every pod in the namespace and isolates them. With no allow rules, all ingress and egress traffic is dropped. From here, you add explicit allow policies for each legitimate connection.
Critical gotcha: egress to DNS (kube-dns or CoreDNS) must be explicitly allowed. Without it, every pod loses the ability to resolve service names, and almost everything breaks. Always pair the deny-all with a DNS allow:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-dns
namespace: production
spec:
podSelector: {}
policyTypes: [Egress]
egress:
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53
Common Policy Recipes
Recipe 1: Allow ingress only from a specific service. The payment-svc only accepts traffic from the api-gateway:
spec:
podSelector:
matchLabels:
app: payment-svc
ingress:
- from:
- podSelector:
matchLabels:
app: api-gateway
ports:
- port: 8080
Recipe 2: Allow egress only to the database namespace. A worker pod that should only talk to the database, nothing else:
spec:
podSelector:
matchLabels:
app: payment-worker
egress:
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: databases
ports:
- port: 5432
Recipe 3: Allow ingress from a specific namespace. All pods in the monitoring namespace can scrape this app's /metrics endpoint:
spec:
podSelector:
matchLabels:
app: my-app
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: monitoring
ports:
- port: 9090
Recipe 4: Block egress to the metadata API. A common attacker pivot point on AWS, GCP, and Azure is the cloud metadata endpoint. Block it explicitly:
spec:
podSelector:
matchLabels:
tier: untrusted
egress:
- to:
- ipBlock:
cidr: 0.0.0.0/0
except:
- 169.254.169.254/32 # cloud metadata IP
Debugging Dropped Traffic
The hardest part of NetworkPolicies is not writing them, it is figuring out why traffic that should work doesn't. Three diagnostic patterns:
1. Use the CNI's policy-test tools. Calico ships calicoctl, Cilium ships cilium policy trace. Both let you simulate "would this connection succeed?" without actually attempting it.
2. Watch the connection from inside the source pod. Exec into the source pod and run nc -zv <dest> <port>. A blocked connection hangs and times out; an open connection returns immediately. The difference tells you whether the policy is the problem.
3. Read your CNI's flow logs. Cilium's Hubble UI shows real-time flow data with policy verdicts. Calico's logging policy lets you log specific dropped packets. Both are essential for production network policy debugging.
Common Gotchas
Gotcha 1: NetworkPolicies do not block traffic between containers in the same pod. If your sidecar talks to your main container over localhost, no NetworkPolicy can stop it.
Gotcha 2: NetworkPolicies do not affect traffic on the host network. Pods using hostNetwork: true bypass the CNI entirely. NetworkPolicies have no effect on them.
Gotcha 3: Selecting "all pods in namespace X" requires the namespace to have a label. The Kubernetes 1.22+ default label kubernetes.io/metadata.name makes this easier, but older clusters need manual labels on every namespace.
Gotcha 4: External traffic via Service or Ingress is subject to NetworkPolicies. An Ingress Controller pod must be allowed in your policies, or all external traffic gets dropped at the policy layer.
For teams that want to automate NetworkPolicy authoring based on observed traffic patterns, tools like Nova AI Ops and dedicated network observability tools can analyze actual pod-to-pod communication and generate the deny-by-default + allow-explicit policies that match your real architecture. Try Nova.