Set Up Elasticsearch

Full-text logs.

Overview

Standing up Elasticsearch (or its OSS sibling OpenSearch) plus Kibana gives a team full-text log search, structured aggregations, and the dashboarding to actually use both. The work that matters is not node count; it is index lifecycle, mapping discipline, and choosing managed-versus-self-hosted before the first GB lands.

Full-text log search. Free-text queries with relevance ranking across millions of lines. The investigation tool ops engineers reach for first.
Aggregations on structured fields. Sum, count, percentile bucketed by service, status code, region. The summarisation layer Kibana visualisations sit on.
Kibana for dashboards and Discover. Saved searches, dashboards, alerting. The UI most engineers learn through.
Index lifecycle plus cluster topology. Hot/warm/cold/delete tiers from day one; deliberate master/data/ingest role split as the cluster grows.

The approach

Three habits make Elasticsearch a reliable platform rather than a recurring 3am page: managed when possible, ILM configured before ingest starts, and index templates that enforce mapping consistency.

Managed when possible. Elastic Cloud or AWS OpenSearch Service. The operational tax of self-hosting is genuine; pay it only for a clear reason.
ILM from day one. Hot, warm, cold, delete tiers configured before the first index ships. Retrofitting ILM under load is painful.
Index templates and consistent mappings. Standard mappings per data source so field types stay stable. Mapping explosions are the recurring outage source.
Cluster planning plus health monitoring. Master/data/ingest role split for scale; cluster health, JVM, and query latency on the standing dashboard.

Why this compounds

Each indexed source grows the team's investigation surface. Cross-service patterns become visible; mean time to root cause drops; the platform becomes a primary analysis tool rather than just a search box.

Faster investigation. Full-text search across the whole estate cuts MTTR on log-heavy incidents.
Cross-service visibility. Aggregations reveal patterns no single service dashboard could show.
Retention matched to access. ILM keeps hot data fast and cold data affordable. Storage cost stays predictable.
Year-one investment, year-two habit. The first install is heavy. By year two, onboarding a new log source is a 30-minute task.