KubernetesBy Nova AI Ops TeamPublished Sep 28, 202612 min read

Kubernetes Network Policies: A Practical Guide for SRE Teams

By default, every pod in your Kubernetes cluster can talk to every other pod. That is fine on day one and a security nightmare on day 365. NetworkPolicies are how you fix it without breaking everything.

Why Network Policies Matter

Kubernetes ships with an open-by-default network. The compromise of any one pod gives the attacker lateral movement to every database, every internal API, every secrets endpoint in the cluster. Most production breaches in 2024-2026 followed this pattern: attacker gets RCE on a low-value web pod, pivots to the database pod, exfiltrates data. NetworkPolicies stop the lateral movement at the network layer.

The right mental model is microsegmentation. Each workload should only be able to talk to the specific dependencies it actually needs. Everything else is denied by default. The hard part is not writing the policies, the hard part is auditing your existing services to figure out who actually talks to whom.

How Network Policies Actually Work

A NetworkPolicy is a Kubernetes object that selects pods (via labels) and defines ingress (incoming) and egress (outgoing) rules. Once at least one NetworkPolicy selects a pod, that pod is "isolated", only traffic explicitly allowed by some policy is permitted, and everything else is dropped.

The critical mental model: NetworkPolicies are additive, not subtractive. You cannot write a "deny X" policy. Instead, you isolate the pod (by selecting it in any policy) and then explicitly allow what should work. Everything not explicitly allowed is denied.

The enforcement happens in your CNI plugin. Calico, Cilium, and Antrea all support NetworkPolicies natively. Flannel does not, which is the most common reason teams discover their policies have no effect, they wrote a perfect policy but their CNI is silently ignoring it.

The Deny-By-Default Pattern

The foundational policy every namespace should have:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}  # selects all pods in namespace
  policyTypes:
  - Ingress
  - Egress

This policy selects every pod in the namespace and isolates them. With no allow rules, all ingress and egress traffic is dropped. From here, you add explicit allow policies for each legitimate connection.

Critical gotcha: egress to DNS (kube-dns or CoreDNS) must be explicitly allowed. Without it, every pod loses the ability to resolve service names, and almost everything breaks. Always pair the deny-all with a DNS allow:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-dns
  namespace: production
spec:
  podSelector: {}
  policyTypes: [Egress]
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: kube-system
      podSelector:
        matchLabels:
          k8s-app: kube-dns
    ports:
    - protocol: UDP
      port: 53
    - protocol: TCP
      port: 53

Common Policy Recipes

Recipe 1: Allow ingress only from a specific service. The payment-svc only accepts traffic from the api-gateway:

spec:
  podSelector:
    matchLabels:
      app: payment-svc
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: api-gateway
    ports:
    - port: 8080

Recipe 2: Allow egress only to the database namespace. A worker pod that should only talk to the database, nothing else:

spec:
  podSelector:
    matchLabels:
      app: payment-worker
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: databases
    ports:
    - port: 5432

Recipe 3: Allow ingress from a specific namespace. All pods in the monitoring namespace can scrape this app's /metrics endpoint:

spec:
  podSelector:
    matchLabels:
      app: my-app
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: monitoring
    ports:
    - port: 9090

Recipe 4: Block egress to the metadata API. A common attacker pivot point on AWS, GCP, and Azure is the cloud metadata endpoint. Block it explicitly:

spec:
  podSelector:
    matchLabels:
      tier: untrusted
  egress:
  - to:
    - ipBlock:
        cidr: 0.0.0.0/0
        except:
        - 169.254.169.254/32  # cloud metadata IP

Debugging Dropped Traffic

The hardest part of NetworkPolicies is not writing them, it is figuring out why traffic that should work doesn't. Three diagnostic patterns:

1. Use the CNI's policy-test tools. Calico ships calicoctl, Cilium ships cilium policy trace. Both let you simulate "would this connection succeed?" without actually attempting it.

2. Watch the connection from inside the source pod. Exec into the source pod and run nc -zv <dest> <port>. A blocked connection hangs and times out; an open connection returns immediately. The difference tells you whether the policy is the problem.

3. Read your CNI's flow logs. Cilium's Hubble UI shows real-time flow data with policy verdicts. Calico's logging policy lets you log specific dropped packets. Both are essential for production network policy debugging.

Common Gotchas

Gotcha 1: NetworkPolicies do not block traffic between containers in the same pod. If your sidecar talks to your main container over localhost, no NetworkPolicy can stop it.

Gotcha 2: NetworkPolicies do not affect traffic on the host network. Pods using hostNetwork: true bypass the CNI entirely. NetworkPolicies have no effect on them.

Gotcha 3: Selecting "all pods in namespace X" requires the namespace to have a label. The Kubernetes 1.22+ default label kubernetes.io/metadata.name makes this easier, but older clusters need manual labels on every namespace.

Gotcha 4: External traffic via Service or Ingress is subject to NetworkPolicies. An Ingress Controller pod must be allowed in your policies, or all external traffic gets dropped at the policy layer.

For teams that want to automate NetworkPolicy authoring based on observed traffic patterns, tools like Nova AI Ops and dedicated network observability tools can analyze actual pod-to-pod communication and generate the deny-by-default + allow-explicit policies that match your real architecture. Try Nova.