NAT Gateway Cost Management
NAT gateway egress fees can dominate a bill. The patterns that contain the cost without sacrificing security.
Audit
NAT gateway costs are one of the most predictable lines on a cloud bill: every gigabyte of egress traffic from private subnets generates per-GB processing fees plus the per-hour gateway cost. Without active management, NAT costs grow with traffic and become a meaningful percentage of the network spend. Audit is the first step; understanding where NAT bytes are coming from is what enables the rest of the discipline.
What good audit looks like:
- Per NAT gateway: bytes processed.: The team enumerates each NAT gateway and the bytes it processed. The data is in CloudWatch metrics; the audit aggregates it monthly. Top-spending NATs surface immediately.
- Sort by cost.: The NATs are sorted by cost (bytes times per-GB rate plus hourly cost). The top-spending NATs are the optimization targets; the bottom-spending NATs need less attention.
- Outsized NATs are often servicing chatty workloads.: A NAT with disproportionate traffic is a signal. The workloads behind it are sending a lot of bytes through; investigation often finds workloads that should not be reaching out to the internet at all.
- Should not need internet.: Many workloads that route through NAT are legitimately reaching the internet (calling external APIs, downloading dependencies). Some are reaching AWS services that should go through VPC endpoints; some are reaching internal services that should not need NAT.
- Investigate top traffic sources.: Beyond the per-NAT view, the team investigates which workloads are generating the most NAT traffic. Per-workload attribution requires VPC flow logs or similar; the investigation produces the targets for the next stage.
Audit is the foundation. Without knowing where NAT costs come from, optimization is guesswork.
Avoidance patterns
The cheapest NAT byte is the one that does not happen. Many traffic patterns that route through NAT could be redirected to avoid the cost entirely. The avoidance patterns are well-known; applying them produces real savings.
- VPC endpoints for AWS services.: Traffic to AWS services (S3, DynamoDB, ECR, KMS, similar) can route through VPC endpoints instead of NAT. The endpoints bypass NAT entirely; the traffic stays on the AWS backbone.
- S3 and DynamoDB endpoints are free.: Gateway-type endpoints for S3 and DynamoDB cost nothing. Every VPC that uses these services should have these endpoints. The savings are pure: nothing to lose.
- ECR endpoints are cheap.: ECR (Elastic Container Registry) is often a high-volume service: every container pull goes through it. The interface endpoint costs more than gateway endpoints but is much cheaper than NAT egress for the same bytes.
- Internal services should not use NAT.: Calls between internal services should not route through NAT. They should use VPC peering, Transit Gateway, or PrivateLink. NAT for internal traffic is a configuration error to fix.
- Caching for external dependencies.: External API calls, package downloads, and similar can be cached internally. The first call goes out; subsequent calls hit the cache. The cache hit avoids the NAT byte entirely.
The avoidance patterns are the highest-leverage cost reductions. They eliminate cost rather than just reducing it.
Scale wisely
NAT scaling matters for high-availability and capacity. The defaults are usually right; over-provisioning produces unnecessary cost; under-provisioning produces capacity issues. Understanding the scaling characteristics prevents both errors.
- Multi-NAT for HA.: High-availability deployments use multiple NAT gateways across AZs. Each AZ has its own NAT; private subnets in each AZ route through their AZ's NAT. The multi-AZ architecture survives any single AZ failure.
- One per AZ is standard.: The standard pattern is one NAT gateway per AZ in use. Three-AZ deployments have three NAT gateways. The cost is bounded; the HA is real.
- Don't over-provision.: Some teams add multiple NAT gateways per AZ for capacity. AWS NAT gateways scale automatically up to 100 Gbps per gateway. Most workloads do not approach this limit; over-provisioning produces waste.
- Gateways scale per-AZ automatically up to 100 Gbps.: AWS handles the scaling. The team does not need to add capacity preemptively. If a gateway is approaching its limit, AWS metrics surface the issue; remediation is timely.
- Cross-AZ traffic is a hidden cost.: Routing private subnets in AZ-A through a NAT in AZ-B incurs cross-AZ data transfer costs. Always route through the same-AZ NAT; the per-AZ pattern produces this naturally.
NAT gateway cost management is one of the most persistently rewarded FinOps disciplines. Nova AI Ops integrates with VPC traffic data and NAT metrics, surfaces top-spending NATs, attributes their traffic to workloads, and produces the optimization queue that the network team uses to drive savings.