Loki vs Elasticsearch
Logging.
Loki strengths
Loki's strengths are cost, operational simplicity, and Grafana fit. The label-index model is the core trade: small index, cheap object storage, fast label queries, slower content scans.
- Cost. One to five dollars per GB per month versus twenty to fifty for Elasticsearch. Loki indexes labels only; raw logs sit on cheap object storage like S3 or GCS.
- Operational simplicity. No shard tuning, no node sizing, no snapshot scheduling. Stateless query layer scales horizontally; storage scales with the underlying object store rather than the cluster.
- Tight Grafana integration. Same-vendor fit per stack. LogQL mirrors PromQL syntax; native for Grafana-stack shops where dashboards already pivot on labels.
- Bounded label set. Curated label list per cluster. Discipline preserves the cost model; high-cardinality labels are the fastest way to lose it.
Elasticsearch strengths
Elasticsearch's strengths are full-text search, ecosystem maturity, and aggregation. Worth the cost when you actually use them; expensive when you do not.
- Full-text search. Free-form queries across all log fields per cluster. Loki only matches labels efficiently; content searches against Loki scan, while Elastic's inverted index makes them constant-time.
- Mature ecosystem. Kibana, Beats, Logstash, and ML features per stack. Long history, broad community, deep integration library that covers the long tail of input formats.
- Aggregation power. Complex aggregations across structured fields per query. Useful for analytics-style queries on log data where the question is "how many" rather than "show me lines matching".
- ILM policy. Index-lifecycle management per cluster. The discipline catches storage explosion before it becomes a budget conversation.
How to decide
The decision is shape-driven. Grafana-stack, full-text-heavy, and compliance-bound each point to a different answer; the wrong pick produces years of friction.
- Kubernetes-heavy, Grafana-stack, cost-sensitive. Loki pick per org. The ecosystem is converging here, and the cost model rewards label discipline.
- Heavy full-text search. Elasticsearch pick per org. Do not migrate to Loki without a clear pain to justify it; full-text on Loki is slow.
- Compliance or Kibana-specific needs. Elasticsearch pick per org. Some industry-specific tooling and audit workflows assume Elastic and cost weeks to retrofit.
- Team-skill match. Existing operational expertise per org. Catches the wrong-tool pick when the spec sheet says one thing and the team's reflexes say another.
Hybrid approaches
Hybrid is rarely worth it. Most teams should pick one and stick; the operational tax of two log stacks usually exceeds the benefits unless the requirements actually demand both.
- Loki hot, Elasticsearch search. Recent-logs-on-Loki, full-text-on-Elasticsearch split per stack. Heavier operational burden; both stacks need backups, upgrades, and on-call coverage.
- Operational complexity. Two-tool ops per stack. Worth it only when specific requirements (compliance retention plus dev-ergonomic search) genuinely cannot collapse to one tool.
- Migration is real work. Querying logic, dashboards, and alerting rewrite per stack. Do not switch without strong justification; the migration tail is months, not weeks.
- Named owner. Responsible team per stack. Operational reviews have a target rather than splitting blame across the platform org.
Common pitfalls
The pitfalls are predictable. High-cardinality Loki, no-ILM Elastic, cargo-cult migration. Each one shows up reliably in retros from teams that skipped the planning step.
- Loki high-cardinality labels. No-high-cardinality rule per cluster. Defeats the cost model the moment a label like user-id or trace-id slips in; log fields go in content, not labels.
- Elasticsearch without ILM. Index-lifecycle policy per cluster. Old indices accumulate, storage explodes, and the cluster gets too expensive to upgrade safely.
- Migrating without understanding. Deep-tool understanding per team before commit. Cargo-culting Loki because Grafana said so produces unhappy teams when the queries the SOC actually runs are full-text.
- Cost monitor. Storage and ingest cost gauge per stack. Catches drift before the next budget review surfaces a six-figure surprise.