Set Up Monitoring Stack

Prometheus + Grafana + Loki.

Overview

Setting up a monitoring stack brings observability online with open-source components. Tool choice transfers; patterns transfer; the discipline of standard tags and per-service instrumentation is the durable investment.

Prometheus, Grafana, Loki. Per-team open-source stack; the modern observability default; vendor-neutral.
Prometheus for metrics. Per-target metrics scrape; the time-series surface; cheap, fast, scriptable.
Grafana for dashboards. Per-team dashboards; the visualisation layer; supports investigation under pressure.
Loki plus per-service instrumentation. Per-cluster log storage; per-service metrics; the trio that covers most investigation paths.

The approach

The practical approach: helm install for the bootstrap, per-service instrumentation as the rollout pattern, standard tags everywhere. The team’s discipline produces useful observability, not just running observability.

helm install kube-prometheus-stack. Standard install; ships Prometheus, Grafana, alertmanager; tune values.yaml after.
Per-service instrumentation. Per-service metrics emit; the rollout pattern that scales as the service catalog grows.
Standard tags. service, env, version on every metric; the foundation of cross-service correlation.
Per-team dashboards. Each team owns its Grafana folder; cross-team views compose from team panels.
Document the stack. Per-cluster configuration committed to the repo; supports investigation and rebuild.

Why this compounds

Monitoring stack discipline compounds across services. Each instrumented service grows the team’s observability surface; cost-per-question falls as the foundation matures.

Faster investigation. Per-service metrics produce fast root cause; MTTR drops because the data is already there.
Better understanding. Per-service metrics teach the team how the service actually behaves under load.
Better cost efficiency. Open-source stack reduces vendor cost; the savings compound across years of growth.
Institutional knowledge. Each instrumented service teaches observability patterns; the team’s muscle grows.

Monitoring stack discipline is an infrastructure investment that pays off across years. Nova AI Ops integrates with monitoring telemetry, surfaces patterns, and supports the team’s observability discipline.