Alerts From Customer Feedback

Some signals come from customers. Convert to alerts.

The gap

Customers sometimes notice problems before instrumentation does. A regional CDN issue, a partner outage, a slow third-party API: synthetic and APM monitors miss these. Support tickets are a signal source: three tickets with the same complaint in 10 minutes is a real incident even if no monitor has fired. Treat customer feedback as a first-class alerting input, not a fallback.

Customers ahead of instrumentation. Regional CDN, partner outage, slow third-party; monitors miss these.
Three-ticket-in-10-min signal. Same complaint, same window; the threshold for incident.
First-class alerting input. Not a fallback; the signal source has its own pipeline.
Per-journey ticket signal. Each user journey has a distinct ticket-volume signature; supports correct triage.

The pipeline

The pipeline wires support tools into the alerting backbone. Zendesk, Intercom, Front into the alerting backbone; trigger when ticket volume crosses a baseline within a fixed window; use a simple anomaly model (more than N tickets matching keyword K within window W, tune N and W per product surface); page on-call when the pattern fires, not the support team because customer feedback alerts are operational signals.

Support tool integration. Zendesk, Intercom, Front; webhook into alerting backbone.
Baseline-crossing trigger. Ticket volume crosses baseline within a fixed window.
Simple anomaly model. N tickets matching keyword K within window W; tune per surface.
Page on-call, not support. Customer feedback alerts are operational signals, not CX issues.

What to listen for

Three signal types catch most issues. Specific feature names (“checkout broken”, “login spinning”, “cannot reset password”) which map directly to user journeys; geographic clusters (three tickets in 5 minutes from one country usually points to a regional CDN or DNS issue); spike patterns (10x increase over a 1-hour baseline is almost always real regardless of keywords).

Feature-name keywords. Checkout broken, login spinning, password reset; map to user journeys.
Geographic clusters. Three tickets in 5 minutes from one country; regional CDN or DNS issue.
10x spike pattern. Over 1-hour baseline; almost always real regardless of keywords.
Per-signal threshold tuning. Each signal type has its own threshold; supports targeted detection.

Avoid overfitting

Three guardrails prevent overfitting. Don’t alert on every ticket because background noise of support volume drowns the signal; use a hold-down so the alert doesn’t fire if a related platform alert has fired in the last 30 minutes (the customer feedback is duplicate then); run weekly retros on customer-feedback alerts because false positives outpace false negatives 2 to 1.

No per-ticket alerts. Background noise drowns the signal; threshold-based only.
30-minute platform-alert hold-down. Customer feedback is duplicate of a related platform alert.
Weekly retros. False positives outpace false negatives 2:1; tune accordingly.
Per-week tuning record. Documented adjustments; supports continued accuracy.

Apply this quarter

The application is staged. Pick your top 3 user journeys and wire ticket-volume alerts for each with a 10-minute window and 3x baseline trigger; test in shadow mode for two weeks (don’t page, just observe, tune until false positive rate is under 20%); promote to paging and track time-to-detect for incidents that started as customer feedback alerts.

Top 3 user journeys first. 10-minute window; 3x baseline trigger; the highest-leverage start.
2-week shadow mode. Don’t page; tune until false positive rate is under 20%.
Promote to paging. Track time-to-detect for customer-feedback-originated incidents.
Per-journey TTD tracking. The metric proves the value; supports continued investment.