Page Pattern Recognition
Patterns across pages reveal systemic issues.
Recognize patterns over individual pages
A single page is data; a pattern of pages is signal. Most teams treat each page as a one-off and miss the pattern, which keeps them in reactive incident response rather than graduating to prevention. Recurring time-of-day, recurring service, recurring root cause, and recurring responder are the four pattern axes worth watching.
- Page is data, pattern is signal. Most teams miss the pattern by treating each page as a one-off.
- Four pattern axes. Recurring time-of-day, recurring service, recurring root cause, recurring responder.
- Reactive vs preventive. Pattern recognition is how you graduate from incident response to prevention.
- Per-team pattern review. The pattern view is a team artifact, not an individual responder’s memory.
What to track
Three breakdowns make patterns visible. Pages per hour-of-day surface cron jobs and traffic peaks; pages per service per week surface noisy services that deserve a sprint; pages per root-cause category surface the systemic issues that span services.
- Per hour-of-day, per day-of-week. Cron jobs and traffic peaks show up here.
- Per service per week. Top-3 noisy services rotate slowly; the same service appearing twice is a project.
- Per root-cause category. Network, deploy, capacity, dependency; categorize at incident close.
- Per-responder load. Per engineer page count; supports fairness and burnout detection.
Acting on patterns
Patterns demand action, not just observation. Friday 3pm spikes are weekly batch jobs or release timing; recurring service is a focused reliability sprint, not another bandage; same root cause across services is an infrastructure issue, not a per-service one.
- Friday 3pm spikes. Investigate weekly batch jobs or release timing; the cadence reveals the cause.
- Recurring service. Schedule a focused reliability sprint, not another bandage; the service deserves real engineering investment.
- Cross-service root cause. Infrastructure or platform issue, not per-service; the fix lives at the platform layer.
- Per-pattern named owner. Each acted-on pattern has an owner and a deadline; supports follow-through.
Tooling
The pattern view needs tooling. PagerDuty analytics or BigPanda reports give per-service breakdowns; a weekly digest in the team channel surfaces top noisy services, top root causes, top hours; auto-categorisation at incident close keeps the data clean enough to query.
- PagerDuty analytics or BigPanda. Per-service breakdown; the standard analytics surface.
- Weekly team-channel digest. Top 3 noisy services, top 3 root causes, top 3 hours; visibility is the first action.
- Auto-categorisation at close. Category dropdown required, not optional; clean data makes the patterns queryable.
- Per-incident category audit. Random sample reviewed for category accuracy; supports data quality.
Make it a recurring meeting
The pattern review should be a meeting, not a dashboard. Every 2 weeks, 30 minutes, focused on the top items; skip below 5 engineers because everyone already sees the patterns; above that, the patterns get lost without the meeting forcing the conversation.
- Bi-weekly cadence. 30 minutes, focused on the top items; the recurring forum drives action.
- Skip below 5 engineers. Small teams already see the patterns; the meeting overhead is not justified.
- Data drives sprint, not lecture. Use the data to inform the next sprint’s reliability work; avoid blame.
- Per-meeting action capture. Each review produces named actions; the meeting is the system that converts pattern to fix.