SLO Window Choice
30-day vs 7-day vs 90-day SLO windows.
30 days
The window over which an SLO is computed shapes the entire reliability practice around it. A 30-day rolling window is the default for most teams, and for good reasons: it matches billing cycles, captures most seasonality, and produces a budget large enough to absorb routine incidents without immediately triggering policy.
What 30-day windows give you:
- Standard across the industry.: Most published SLAs use 30-day or calendar-month windows. Customer expectations align with the convention; comparison across vendors is straightforward; compliance frameworks expect the rhythm.
- Aligns with billing cycles.: The customer's monthly invoice maps to the same window as the SLO. Service credits for SLA breaches calculate per billing cycle. The contractual mechanics are clean.
- Forgiving enough for routine ops.: 30 days at 99.9% gives roughly 43 minutes of error budget. A typical month has one or two minor incidents that consume budget but do not exhaust it. The team can absorb routine operations without the policy firing every month.
- Captures weekly cycles.: Most services have weekly traffic patterns. 30 days includes four full cycles, so the metric represents the full operating distribution rather than a biased sample.
- Default for most teams.: If you do not have a specific reason to pick something else, pick 30 days. The default works; deviations require justification.
The 30-day window is the right starting point. Shorter or longer windows are answers to specific operational questions, not improvements on the default.
7 days
A 7-day window produces faster feedback at the cost of smaller error budgets. For high-velocity teams that ship many times a day, the faster feedback matches their operating rhythm.
- Faster feedback on changes.: A regression that lands today shows up in the SLO numbers within a day rather than two weeks. The team can react before the issue compounds. For continuous deployment teams, this responsiveness is more valuable than the smoothness of a longer window.
- Smaller budgets, less forgiveness.: 7 days at 99.9% is about 10 minutes of error budget. A single incident can exhaust it. The team has to operate more carefully because the budget is smaller; the policy fires more often.
- High-velocity teams benefit most.: Teams shipping 50+ times per day per service get the most value from a 7-day window. The deploy cadence matches the budget cadence. Slower-moving teams see more noise than signal in a 7-day window.
- Sensitivity to single incidents.: A 4-hour outage on a 7-day window is a much bigger fraction of budget than the same incident on a 30-day window. The visible burn is sharper. This is a feature for teams that want to feel the consequences quickly; a bug for teams that prefer smoother metrics.
- Internal use, not external SLA.: 7-day windows are typically internal performance targets, not customer-facing SLAs. Customers expect monthly numbers. Use 7-day for engineering rhythm; keep 30-day for the public commitment.
7-day windows match the velocity of modern engineering teams. The trade-off is smaller budgets and more frequent policy firing, which the team must be willing to live with.
90 days
A 90-day window smooths out month-to-month variation and produces budgets large enough to absorb major incidents. For stable services where the operating profile changes slowly, the longer window produces cleaner signal.
- Slower changes, smoother metrics.: Day-to-day noise averages out. A bad week does not necessarily breach the budget; the team has time to recover. The SLO numbers are more stable, less alarmist.
- Stable services benefit.: Mature, slow-moving services where the architecture is stable and the team is small are good candidates for 90-day windows. The services do not need fast feedback because changes are infrequent.
- Larger error budgets.: 90 days at 99.9% gives roughly 130 minutes of budget. Even a multi-hour incident does not necessarily exhaust the budget. The team can absorb significant events without the policy firing.
- Aligns with quarterly business cycles.: 90-day windows match quarterly reviews, board cycles, and many compliance reporting periods. Reporting becomes naturally aligned with the cadence the rest of the company is using.
- Slower response to regressions.: The trade-off is that a regression that lands today takes weeks to be visible in the SLO numbers. The team needs to rely on alerting rather than the SLO dashboard for incident response. The SLO becomes a long-term scorecard rather than an operational signal.
SLO window choice is not "best" or "worst"; it is "match the team's operating cadence." Nova AI Ops supports per-service window configuration, computes SLO performance over multiple windows simultaneously (so the team can use 7-day for engineering rhythm and 30-day for customer reporting), and surfaces the right number for the right audience without requiring the team to pick one and lose the others.