SLO & Reliability Practical By Samson Tanimawo, PhD Published Dec 1, 2025 4 min read

Monthly SLO Review Format

30-min monthly review of SLO health.

Agenda

Without a recurring meeting, an SLO practice slowly drifts from a living number into a static dashboard nobody looks at. The monthly review is the forcing function that keeps the practice alive. Done well, it is short, focused, and produces concrete decisions. Done poorly, it becomes status theater that consumes time without changing anything.

The agenda that holds up:

Per service: SLO performance vs target.: One slide or one dashboard view per service. Last month's actual SLO compared to the published target. Green if met, red if missed, yellow if borderline. The list is sortable; the riskiest services rise to the top of the conversation.
Per service: budget burn rate.: Even when the SLO was met, the burn pattern matters. A service that finished the month at 99.91% from a starting position of 99.99% is in worse shape than one that finished at 99.91% from 99.92%. The trajectory is a leading indicator the static number hides.
Per service: contributing incidents.: The two or three events that drove most of the budget burn for the month. Brief description, root cause, what was done. Not a deep retro; the retro happens elsewhere. The review references the retro outcomes.
Per service: action items in flight.: The reliability work in flight from previous reviews. Status update on each. Items that have closed; items that are blocked; items that are slipping.
30 minutes, hard limit.: The review is short. 30 minutes max for a portfolio of 10 to 20 services. Anything that does not fit gets surfaced as a separate followup. The time pressure is what keeps the meeting from becoming a deep-dive bog.

The agenda is the same every month. The discipline of sameness is what produces the comparable trajectory data over quarters and years.

Attendees

The wrong attendee list kills the meeting. Too many people and the conversation becomes a status update for an audience. Too few and the decisions cannot be made. The right list is the people whose decisions the meeting needs.

SRE leads.: The people responsible for the SLO practice across the org. They run the meeting, surface the patterns, and connect the dots between services. Their job is to know the portfolio, not just individual services.
Service owners.: The engineering leads (or designated representatives) for the services on the agenda. They own their service's SLO performance and they make commitments about reliability investment. Their presence is what turns the meeting from observation into action.
Engineering leadership.: The director or VP level responsible for the engineering org's reliability posture. They allocate cross-team resources, escalate to executive when needed, and connect SLO performance to business outcomes. Their presence keeps the meeting at a strategic altitude.
Right people, not all people.: Anyone whose work is not relevant to this month's discussion does not need to be in the meeting. The list rotates by what is on the agenda. A meeting with 30 attendees is a status broadcast; a meeting with 6 attendees is a decision-making forum.
No regular customer-facing roles unless escalated.: Customer success, sales, and product do not attend the routine meeting. When SLO performance affects them (a missed quarter, a customer escalation, a tier-change conversation), they get pulled in for that specific topic. Otherwise the meeting stays engineering-internal.
Notes go out to a wider list.: The discussion is small; the audience for the outcomes is large. A short summary distributed afterward to engineering leadership, service teams, customer success, and any other interested parties keeps the broader org informed without diluting the meeting itself.

The attendee discipline is what keeps the meeting useful. Anyone who cannot make a decision in the meeting is consuming the time of those who can.

Output

The point of the meeting is the output, not the meeting itself. A review that ends with everyone nodding and no concrete actions has wasted everyone's time. The structure that produces real outcomes is investment decisions and tracked action items.

Investment decisions.: The biggest output is allocation: which service gets reliability investment next month, which service is fine, which service needs leadership escalation. The decisions are explicit and documented in the meeting notes.
Action items, owners, deadlines.: Each decision becomes one or more action items with a named owner and a target date. "Improve search SLO" is not an action item; "ship distributed cache by month-end to reduce search p99 from 800 ms to 400 ms, owner: Maria" is.
Tracked across reviews.: Action items from the previous review are reviewed at the next one. Items that closed get celebrated briefly; items that slipped get a status update; items that opened from this month go on the list. The continuity is what produces multi-month progress.
Escalations explicit.: When the team's reliability posture cannot be addressed at this meeting's level, escalation is documented as an output. "Service X has missed three consecutive quarters; escalating to engineering leadership for Q3 staffing decision."
Concrete, never vague.: "We should focus more on reliability" is the wrong output. "We are reallocating two engineers from team A to platform team B for Q3 to address the reliability backlog on service X" is the right one. The specificity is the difference between productive and performative.

A monthly SLO review with a tight agenda, the right attendees, and concrete outputs is one of the highest-leverage operational meetings an engineering org runs. Nova AI Ops produces the per-service performance summary, the budget burn trends, the contributing incidents, and the open action item list automatically, so the meeting time is spent on decisions rather than on assembling the data.