Multi-Cluster Egress Security

Multi-cluster setups need consistent egress policy. The patterns and the enforcement.

The egress security model

The egress security model rests on default-deny, FQDN-based allowlists where possible, and logging on every egress flow. Default-deny catches lateral movement after compromise and bounds blast radius; FQDN allowlists outlive IP-based ones because cloud services change IPs; logs make compromise detectable rather than newsworthy.

Default-deny egress. Pods must explicitly allow outbound destinations; catches lateral movement after compromise; bounds blast radius.
FQDN allowlists. IP-based allowlists rot as cloud services change IPs; modern egress controllers (Cilium, Calico Enterprise) support FQDN policies.
Logging on every flow. Compromise detection requires the audit trail; without logs, you find out about compromise from the news.
Per-cluster baseline. Default-deny plus FQDN allowlists plus logging is the shared baseline across every cluster in the fleet.

Centralised egress patterns

Centralised egress routes all clusters’ outbound through one path, giving a single point to enforce policy and inspect traffic. Per-cluster egress is operationally simpler but loses the uniform policy enforcement; the trade-off depends on the team’s tolerance for centralised cost and the value of inspection.

Shared VPC or proxy. All clusters route outbound through one path; single point to enforce policy and inspect traffic.
Egress proxy with SSL inspection. Squid or vendor solutions; invasive but catches credential exfiltration.
Per-cluster simplicity trade-off. Per-cluster egress is simpler operationally; centralised gets uniform policy at the cost of single point of failure or cost.
Per-org choice. The centralised-vs-per-cluster decision is documented; supports consistent policy across teams.

Policy enforcement

Policy enforcement spans layers. NetworkPolicy resources plus a supporting CNI for in-cluster egress; cloud security groups, route tables, and transit gateway rules beyond the cluster; per-namespace allowlists for granular team-level control. Defence in depth means each layer is enforced independently.

NetworkPolicy plus CNI. Pod-level egress control; needs a CNI that supports NetworkPolicy (Calico, Cilium).
Cloud-layer rules. Security groups, route tables, transit gateway rules; each layer enforced independently.
Per-namespace allowlists. Different teams have different legitimate destinations; per-namespace policy is granular and reviewable.
Per-policy ownership. Each allowlist entry has an owner; supports the audit and review cadence.

Auditing egress

Auditing closes the loop. VPC flow logs at the cloud layer, pod egress logs at the cluster layer, both feeding an SIEM for correlation and detection. Anomaly detection on new external IPs, sudden volume changes, and unusual hours surfaces compromise; quarterly audit catches forgotten test integrations and abandoned phone-home services.

VPC flow logs plus pod egress logs. Both layers feed the SIEM for correlation and detection.
Anomaly detection. Traffic to new external IPs, sudden volume changes, traffic at unusual hours; SIEM flags for review.
Quarterly egress audit. Catches forgotten test integrations and abandoned services with phone-home behaviour.
Per-anomaly investigation playbook. Documented response steps; supports fast triage when the SIEM flags real exfiltration.

Operating multi-cluster egress security

Operating multi-cluster egress security needs a standard policy template, non-prod testing before promotion, and a documented allowlist with owners and justifications. Drift across clusters defeats the security posture; egress changes can break legitimate traffic, so test runs are the safety net.

Standard policy template. Per-cluster customisation is the exception, not the rule; drift across clusters defeats the security posture.
Non-prod test before promote. Egress changes can break legitimate traffic; test runs catch the regressions.
Documented allowlist. Each entry has an owner and a justification; periodic review removes stale entries.
Per-quarter allowlist review. Stale entries removed; supports the principle of least access over time.