The Dashboard Versioning Discipline
Dashboards in version control. The discipline that prevents 'who changed this' debates.
Dashboards in git
Dashboards in version control end the 'who changed this' debate. The JSON exports cleanly; commits are reviewable; the diff surfaces subtle modifications.
- API export. Grafana, Datadog, and others support API-based JSON export; commit on every change.
- Per-dashboard file. Naming convention
service/purpose.json; engineers grep by service or signal type. - PR review. The diff is reviewable; accidental changes and bad-default propagation surface in code review.
- Source of truth. Git, not the UI; the dashboard does not exist if it is not in git.
CI deployment
CI deploys the dashboard JSON the same way it deploys application code. The pipeline is the only path to production; the UI is read-only.
- On merge. CI deploys the JSON to the platform via API; same pipeline as application deploys.
- UI read-only. Engineers cannot click-edit production dashboards; they must PR.
- Staging first. Test in staging; broken queries or missing variables get caught before production.
- Idempotent. Re-running CI does not break anything; the JSON is the desired state.
Rollback
Git revert restores the previous dashboard. CI redeploys; the change is gone in minutes; the audit trail survives.
- Git revert. Standard revert; CI redeploys; the dashboard returns to the previous state in minutes.
- Without versioning. Rollback requires manual reconstruction from memory; painful and error-prone.
- Audit trail. Git history shows every change, who made it, why; compliance-friendly.
- Forensic value. 'When did this query change' becomes a one-line git log; previously a panel-by-panel hunt.
Templating shared dashboards
Common patterns become template dashboards. Generate per-service instances from one template; the source of truth is the template plus the service inventory.
- Common patterns. Service health, deploy status, error rate; same shape for every service.
- Generators. Jsonnet for Grafana, Terraform for Datadog, custom scripts otherwise.
- Inventory-driven. Template plus service inventory equals all dashboards; adding a service generates its dashboards automatically.
- Ripple updates. New panel in the template appears on every service dashboard at next deploy.
Operating discipline
Versioning is not enough on its own. Ownership, cleanup, and access review are the three disciplines that keep the dashboard set healthy over time.
- Owner per dashboard. Owner team in metadata; ownerless dashboards rot and accumulate.
- Quarterly cleanup. Dashboards not viewed in 6 months flagged; documented as still-needed or retired.
- Access review. Edit permissions on prod dashboards limited to platform engineers; everyone else reads-only or PRs.
- Annual audit. Per-team dashboard count and usage trended; the team owns its surface area.