Multi-Cluster Egress Security
Multi-cluster setups need consistent egress policy. The patterns and the enforcement.
The egress security model
Default-deny egress at the cluster boundary. Pods must explicitly allow outbound destinations. Catches lateral movement after compromise; bounds blast radius.
Allowlist by FQDN where possible. IP-based allowlists rot as cloud services change IPs. Modern egress controllers (Cilium, Calico Enterprise) support FQDN policies.
Logging on every egress flow. Compromise detection requires the audit trail. Without logs, you find out about compromise from the news.
Centralised egress patterns
Egress through a shared VPC or proxy. All clusters route outbound through one path. Single point to enforce policy and inspect traffic.
Egress proxy (like Squid, or vendor solutions) inspects traffic. SSL inspection is invasive but catches credential exfiltration. Trade-off with operational complexity.
Per-cluster egress is simpler operationally. Centralised gets you uniform policy at the cost of a single point of failure or cost.
Policy enforcement
NetworkPolicy resources for in-cluster egress. NetworkPolicy plus a CNI that supports it (Calico, Cilium) gives pod-level control.
Beyond cluster: cloud security groups, route tables, transit gateway rules. Each layer is enforced independently; defence in depth.
Per-namespace egress allowlists. Different teams have different legitimate destinations. Per-namespace policy is granular and reviewable.
Auditing egress
VPC flow logs at the cloud layer. Pod egress logs at the cluster layer. Both feed an SIEM for correlation and detection.
Anomaly detection: traffic to new external IPs, sudden volume changes, traffic at unusual hours. SIEMs flag for review.
Quarterly audit: any egress that looks unintended? Often catches forgotten test integrations, abandoned services with phone-home behaviour.
Operating multi-cluster egress security
Standard policy template across clusters. Per-cluster customisation is the exception, not the rule. Drift across clusters defeats the security posture.
Test policies in non-prod before promoting. Egress changes can break legitimate traffic; test runs catch the regressions.
Document the allowlist. Each entry has an owner and a justification. Periodic review removes stale entries.