Observability Intermediate By Samson Tanimawo, PhD Published Sep 6, 2026 5 min read

Log Retention Economics: How Long Should You Keep Logs?

30 days? 90 days? 7 years? The right answer depends on incident frequency, regulatory requirements, and what your retention bill looks like at scale. The framework that gets the trade-offs right.

The question to ask first

"How long do we need this log searchable for an active incident?" If the answer is "12 hours" the log doesn't need 90-day hot retention. Most logs over-retain by an order of magnitude.

The retention default is wrong. Most teams set retention by what the vendor allows or what feels comfortable, not by what the team needs. The result: 90-day hot retention for logs the team queries for 24 hours and never again. The cost difference between 24-hour and 90-day hot retention is roughly 4x; teams pay 4x for storage they don't use.

The honest question. For each log type, ask "when do we last query this?" Most application logs follow a steep decay: queried heavily for 12 hours, occasionally for 7 days, almost never after 30. Match retention to the query pattern.

Four retention tiers

Most modern stacks support tiering with different cost and query speeds. Use the tiers; don't pay hot prices for cold use cases. The four-tier structure (hot/warm/cold/archive) maps cleanly to four query patterns and four price points.

The discipline. Each log source gets a retention tier per stage. Application logs: 1-3 days hot, 30 days cold. Audit logs: 30 days hot, 7 years archive. Different tiers for different needs; no single tier serves all.

Hot tier (0-3 days)

Sub-second query, full text search. Most expensive. Hot is for active incident response. After 72 hours, almost no logs are queried.

The pattern. During an incident, the team queries logs heavily; after resolution, queries drop sharply. By 72 hours after an incident, query rate is 1% of peak. The hot tier exists to serve this active-response window; beyond it, the cost isn't justified.

The pricing. Cloud-native logging (Elasticsearch, Splunk, Datadog) charges $0.50-$2.00/GB-month for hot. A team ingesting 100GB/day at hot retention of 7 days is paying $50-$200/month per day-equivalent of retention. Compresses to $1k-$5k/month for 7 days; $10k+/month for 30 days. The compounding is steep.

Warm tier (3-30 days)

5-second query, full text search. ~5x cheaper than hot. Warm is for postmortem analysis and customer escalations.

The use case. Postmortems write 5-10 days after the incident; the team needs to query historical logs but doesn't need sub-second response. Customer escalations sometimes reference incidents from a few weeks ago; the team needs query access but not real-time.

The implementation. Most modern logging platforms support automatic tiering, logs roll from hot to warm at 3 days. The team's queries automatically use the right tier based on time range. Engineers don't need to think about it; the system manages it.

Cold tier (30-180 days)

Minute-scale query, often by date partition. ~50x cheaper than hot. Cold is for trend analysis and quarterly reviews.

The use case. Quarterly business reviews look at 90-day trends. Compliance audits review 180-day windows. Both can tolerate slow queries (the analyst is not in real-time conversation with the data); both need full data, not samples.

The query pattern. Cold-tier queries are batch jobs, run for minutes, return aggregated results. Engineers who try to use cold for incident response get frustrated; that's not what cold is for.

Archive tier (180 days+)

Object storage. Hours to query. ~500x cheaper than hot. Archive is for compliance and rare deep-dive forensics.

The use case. Regulators ask for "all transactions from Q3 2023." Security investigators need "all logs from this user account ever." Both require full historical data; both can wait hours for query results.

The implementation. Logs roll from cold to archive (S3 / GCS / Azure Blob) automatically. Query is via SQL-on-blob (Athena, BigQuery, etc.). The cost is storage-only ($0.02/GB-month) until queried; query cost is per-scan and modest.

How to decide where each log lives

For each log source: how often is it queried after 24 hours? After 7 days? After 30 days? Plot the curve. Tier accordingly. Most application logs follow a steep decay; most security/audit logs follow a slower decay. Don't apply one retention to both.

The decay analysis. Pull query metadata from your logging platform. For each log source, count queries per age bucket (0-24h, 1-7d, 7-30d, 30+d). The query distribution tells you where the value is.

The output. A retention policy per log source: log_source X is hot for 1 day, warm for 14 days, cold for 60 days, archive for 7 years. Document this; update annually as patterns change.

Common antipatterns

One retention policy for all logs. "We retain everything for 90 days." Application logs at 90-day hot is 30x over-retained. Audit logs at 90-day hot is 5x under-retained. Different needs.

The vendor's default. "We use whatever the vendor's default is." Vendors default to higher retention because it makes them money. Pick deliberately.

No annual review. Retention policies set once and forgotten. Volume grows; cost compounds. Annual review keeps the policy aligned with use.

Avoiding cold storage out of fear. "What if we need to query?" You can. It just takes minutes. The savings (50x) are worth the wait for the rare 30-day-old query.

What to do this week

Three moves. (1) Compute your monthly logging bill. Most teams underestimate this by 2-3x. (2) For your top 3 log sources, plot the query-by-age curve. Match the retention to the curve. (3) If you're on a single retention tier, configure tiering. Most modern platforms do this automatically once enabled.