First Thanos Install
Long-term Prom.
Overview
The first Thanos install is the moment Prometheus moves from local-only retention to long-term object-storage-backed metrics. Thanos provides global query and S3-backed retention; the install produces multi-year metric history without losing PromQL compatibility.
- Long-term Prometheus. Object storage retention; historical analysis becomes possible across multi-year windows.
- Global query. Query across multiple Prometheus instances; federated visibility from a single PromQL query.
- Sidecar mode plus PromQL. Thanos sidecar uploads blocks to S3; queries use the same PromQL the team already knows.
- Compaction. Older data downsampled to lower resolution; cost-efficient retention without losing trend visibility.
The approach
The practical approach is sidecar mode first, S3 backend for storage, retention policies tuned per resolution, compaction for cost, documented topology. The team’s discipline produces stable Thanos that survives operator turnover.
- Sidecar mode. Thanos sidecar runs alongside each Prometheus; matches existing infrastructure without re-architecting.
- S3 backend. Cheap object storage; the long-tail retention cost is dominated by storage class, not data volume.
- Retention policies. Per-resolution retention; raw 30 days, 5m 1 year, 1h forever shape the storage bill.
- Compaction plus documented topology. Older data downsampled; per-component role committed to the repo for investigation.
Why this compounds
Thanos discipline compounds across years. Each year of metric retention grows the team’s investigation capability; trend analysis spans quarters instead of weeks.
- Long-term metrics. Historical analysis spans multi-year windows; trend visibility for capacity planning becomes routine.
- Multi-cluster visibility. Global query supports many clusters; federated operations from a single dashboard.
- Cost efficiency. S3 plus downsampling produces affordable retention; the bill stays bounded as data grows.
- Institutional knowledge. Each query teaches monitoring patterns; the team’s observability muscle grows.
The first Thanos install is an infrastructure investment that pays off across years. Nova AI Ops integrates with metrics telemetry, surfaces patterns, and supports the team’s monitoring discipline.