Page Pattern Recognition
Patterns across pages reveal systemic issues.
Recognize patterns over individual pages
A page is data. A pattern of pages is signal. Most teams treat each page as a one-off and miss the pattern.
Patterns include: recurring time-of-day, recurring service, recurring root cause, recurring responder.
Recognizing patterns is how you graduate from incident response to prevention.
What to track
Pages per hour-of-day, per day-of-week. Cron jobs and traffic peaks show up here.
Pages per service per week. Top-3 noisy services rotate slowly. The same service appearing twice is a project.
Pages per root-cause category. Network, deploy, capacity, dependency. Categorize at incident close.
Acting on patterns
Friday 3pm spikes: investigate weekly batch jobs or release timing.
Recurring service: schedule a focused reliability sprint, not another bandage.
Same root cause across services: it's an infrastructure or platform issue, not a per-service issue.
Tooling
PagerDuty analytics or BigPanda's reports give the per-service breakdown.
Build a simple weekly digest in the team channel. Top 3 noisy services, top 3 root causes, top 3 hours.
Auto-categorize at incident close. The category dropdown should be required, not optional.
Make it a recurring meeting
Pattern review every 2 weeks. 30 minutes, focused on the top items.
Skip if the team is small enough that everyone already sees the patterns. Above 5 engineers, the patterns get lost.
Don't lecture during the meeting. Use the data to drive the next sprint's reliability work.