Observability Practical By Samson Tanimawo, PhD Published Apr 11, 2026 4 min read

Incident Replay From Traces

Re-running incidents from captured traces to validate fixes. The pattern, the tooling, and the high-stakes incidents that warrant it.

The idea

Capture the trace context of an incident: services involved, sequence of events, key timing.

Recreate in pre-prod. Apply the proposed fix. Verify the fix would have prevented the incident.

When to do it

Sev 1 and sev 2 incidents. Sev 3-4 usually do not justify the engineering time.

Recurring incident classes. The replay tooling pays back when the same class hits multiple times.

Limits

Stateful systems are hard to replay. Database states differ between captures.

Replay is a complement to root-cause analysis, not a substitute.