The Vendor Egress Cost Watch
Sending telemetry to a vendor costs egress fees. The watch, the trade-offs of in-region collectors, and the surprises.
Cost model
Vendor egress cost is the often-overlooked line item that emerges when telemetry data is shipped to SaaS observability vendors. Logs, metrics, and traces routed to Datadog, New Relic, Honeycomb, or similar tools cross the cloud provider's egress boundary; the egress cost adds to the vendor's per-GB ingestion charge. Without active management, the egress cost can equal or exceed the vendor's bill itself.
What the cost model looks like:
- Cloud provider egress: $0.05 to $0.12 per GB.: AWS egress to the public internet costs roughly $0.05 to $0.09 per GB depending on volume; GCP and Azure rates are similar. Telemetry shipped to vendors crosses this boundary; the cost is per-GB on top of any compression.
- Telemetry volume can be 1 to 10 TB per month.: A medium-sized service produces gigabytes of logs, metrics, and traces daily. Across a fleet, the total can reach terabytes per month per service. The volumes are real; the cost adds up.
- $50 to $1,200 per month per service.: Egress alone, before vendor charges. A small service might pay $50 in egress; a large service might pay over $1,000. Across many services, the total egress bill becomes meaningful.
- Vendor charges add on top.: The vendor (Datadog, New Relic, similar) charges for ingestion, indexing, and retention. The vendor bill is typically 5 to 10 times the egress bill; the egress is a non-trivial fraction.
- Total observability cost is the sum.: Egress plus vendor charges plus internal collection infrastructure. Looking only at the vendor bill misses the full picture; the egress cost is part of the same decision.
The cost model is the foundation. Without understanding the egress dimension, optimization focuses on the wrong levers.
Reductions
The strategies for reducing vendor egress cost are well-known. The discipline is in applying them and revisiting as telemetry volume grows.
- In-region collectors that compress before egress.: An OpenTelemetry collector or similar in-region aggregator receives raw telemetry, compresses it, and sends to the vendor. The compressed bytes are what get charged for egress; the savings are 5 to 10x for typical telemetry.
- 5 to 10x reduction in egress bytes.: Logs and traces compress well; the gzip ratio is often 5 to 10 times. The egress cost drops proportionally; the vendor receives the same data; only the network bytes change.
- Sampling at the collector.: Trace sampling at the collector ships only a fraction of traces. Head-based or tail-based sampling reduces volume significantly. The team chooses sampling strategies that preserve the value of the data.
- Filtering at the collector.: Logs that are never queried do not justify the egress cost. The collector filters: drop debug logs in production, drop heartbeat logs, drop logs from specific noisy components. The savings compound across many sources.
- Don't ship what you wouldn't query.: The discipline is the same as for log retention generally. If a log line is not going to be queried, why ship it? The filtering happens at the collector; the egress cost goes to zero for filtered data.
The reductions are well-known and high-leverage. Applying them produces immediate, ongoing cost savings.
Watch
Egress cost grows when nobody watches it. New services ship more telemetry; the egress climbs; the bill arrives at month-end with surprise. Active monitoring catches growth before it bites.
- Per-service egress dashboard.: Each service's vendor egress is tracked separately. The dashboard shows trends; sudden growth is visible; the per-service view supports accountability.
- Cost owner per service.: Each service has a cost owner. The owner sees their service's egress; the owner is accountable for keeping it bounded. Without ownership, costs grow without anyone noticing.
- Surprises: a service that doubled its volume in a week.: Sudden volume growth is a signal. Was a new feature deployed? Did logging level change? Did a bug start logging excessively? The investigation determines the cause.
- Tag the cause.: Once the cause is identified, it is tagged. Was it intentional (a new feature with legitimate logging)? Was it a bug (excessive logging that needs fixing)? The tagging supports future analysis.
- Act.: Intentional growth may warrant filtering tweaks; bugs need fixing; the response is matched to the cause. The discipline is acting promptly; the longer growth persists, the larger the bill.
Vendor egress cost watch is one of those FinOps disciplines that pays off proportionally to the rigor applied. Nova AI Ops integrates with cloud cost data and observability vendor billing, surfaces per-service egress trends, and produces the per-team queue that drives cost discipline at the team level.