The SRE Backlog Anti-Pattern Trap

An SRE team with a 200-item backlog is not winning. The signs you have fallen into the backlog trap and how to dig out.

Signs of the trap

The trap looks like productivity. The team is busy, tickets keep coming, but the underlying reliability never improves and the queue keeps growing.

Aged backlog. 50+ open SRE tickets with no activity in the last 30 days.
Ignored incidents. New incidents create new tickets that join the pile; nobody works them.
Firefighter identity. The team self-describes as firefighters; preventive work is aspirational.
Repeat root causes. The same class of incident appears twice in a quarter because the fix is queued, not shipped.

Cause

The trap is rarely a skill problem. It is a planning, capacity, and policy problem that compounds quietly until the queue is unrecoverable.

Reactive bias. Incidents preempt planned work every sprint; preventive items never reach 'in progress'.
Optimistic capacity. The team commits to 5 items per sprint; reality is 2; the unfinished 3 carry over.
No retirement policy. Tickets never close, never get rejected, never get 'will not fix'; the queue grows monotonically.
No prioritisation function. Every ticket feels P1; without ruthless ranking the queue order is creation date by default.

Dig out

Digging out is uncomfortable. It requires closing tickets people care about and saying no to incoming work for a quarter while the team rebuilds slack.

Triage aggressively. Anything older than 90 days with no activity closes as 'will not fix' unless an owner fights for it.
Cap active work. Maximum 30 in-flight tickets; new tickets queue, which forces prioritisation conversations.
Reserve proactive capacity. 30% of team time blocked on calendars for preventive work; defend it against incident pull.
Quarterly reset. Run the triage every quarter; once is a clean-up, repeated is the discipline.