The Instrumentation Budget Per New Service
Every new service has an observability budget. The expected metrics, logs, traces, and the launch gate.
Default budget
The instrumentation budget policy is the discipline of pre-allocating observability resources per service. Without a policy, instrumentation is either anemic (services launch under-instrumented) or excessive (services bury the platform in noise). The policy sets sensible defaults and requires justification for deviations; the discipline produces consistent observability across the organization.
What the default budget looks like:
- 20 metrics per service.: A service is allocated 20 distinct metrics by default. The number is enough for golden signals and key business metrics; not so many that the metric database is overwhelmed.
- 5 golden signals.: Latency, traffic, errors, saturation, and one team-specific (often availability or success rate). The golden signals are the universal observability foundation.
- 15 business metrics.: The remaining 15 cover service-specific signals: domain-specific events, customer-relevant metrics, capacity indicators. The business metrics tell the team how the service is performing in business terms.
- More requires justification.: Services that need more than 20 metrics submit a justification. The review confirms the metrics are needed; capacity is allocated; the metric database accommodates the request.
- Log volume: 100GB per month default.: A typical service produces some log volume; 100GB per month is a reasonable starting point. Higher volume requires capacity planning and explicit allocation.
- Higher requires capacity planning.: Services that produce significant log volume are part of the platform's capacity planning. The team's logging infrastructure is sized; the service's allocation is part of the budget.
The default budget is the starting point. The policy is what enforces the discipline; without it, instrumentation drifts.
Launch gate
The launch gate is the discipline that prevents under-instrumented services from reaching production. The gate requires baseline observability before launch; deviations require explicit waiver.
- Service cannot launch without: golden-signal metrics.: The five golden signals are required. Without them, the service launches blind; the team cannot tell whether the service is healthy. The gate enforces baseline visibility.
- Error logs.: Errors must be logged at INFO or higher. The logs reach the team's logging infrastructure; the team can investigate when errors occur.
- Distributed tracing for entry points.: The service's entry points produce distributed traces. The traces flow into the tracing platform; downstream investigation has the data it needs.
- Deviation requires explicit waiver.: Some services genuinely do not need full instrumentation (a one-off batch job, a deprecated service in maintenance). The waiver process accommodates these; the documentation tracks the exception.
- Most don't bother and just instrument properly.: The waiver process is friction. Most teams find it easier to just instrument the service properly than to obtain a waiver. The friction produces the desired outcome.
The launch gate is the discipline that catches gaps before they matter. Day-1 incidents on under-instrumented services are debug-blind; the gate prevents this category of incident.
Why this matters
The instrumentation budget policy compounds across the organization's lifetime. Each new service starts well-instrumented; legacy services migrate toward the policy; the platform's observability quality rises continuously.
- Without policy, services launch under-instrumented.: The default human behavior is to focus on shipping, not on observability. Without policy, services ship with minimal instrumentation; observability is added reactively after incidents.
- Day-1 incidents are debug-blind.: A service that launches without proper instrumentation faces incidents without observability data. The investigation is slow; the resolution is delayed; the customer experience suffers.
- Policy aligns incentives.: The policy makes proper instrumentation the path of least resistance. Engineers instrument to ship; the discipline is built into the development process; the result is well-instrumented services from launch.
- Engineers instrument to ship, not as an afterthought.: The shift in mindset matters. Instrumentation is part of the development process, not a follow-up. The earlier the instrumentation appears, the cheaper it is to add and the more useful it is at launch.
- The bar rises over time.: Each generation of services raises the bar. Templates improve; defaults strengthen; the discipline compounds. The platform's observability quality becomes sustainable.
Instrumentation budget policy is one of those engineering disciplines that pays off over the platform's lifetime. Nova AI Ops integrates with deployment and observability platforms, surfaces services that are out of policy, and produces the per-service compliance report that the platform team uses to drive instrumentation discipline.