Postmortem on Vendor Incidents
Even when not yours.
Overview
Postmortems on vendor incidents capture the team’s response and dependency analysis even when the root cause is upstream. The instinct to skip the postmortem because "it was their fault" misses the point: the team’s response to the vendor incident is what determined customer impact, and the dependency analysis informs whether the team should reduce vendor blast radius before the next time. Vendor postmortems produce resilience improvements that are entirely under the team’s control.
- Even when not yours. Per-vendor incident team postmortem; the team’s response is the actionable surface.
- Per-vendor incident timeline. Per-vendor team timeline; documents how the team detected, responded, and recovered.
- Team response analysis. Per-vendor team response evaluation; surfaces what worked and what could have worked better.
- Action items for resilience plus dependency analysis. Per-vendor resilience improvements (multi-region, fallback, graceful degradation); per-vendor dependency review for blast-radius reduction.
The approach
The practical approach is to write a postmortem for every vendor incident that affected customers, focus the analysis on team response (detection time, comm quality, recovery actions) rather than upstream blame, generate resilience action items that reduce future vendor blast radius (multi-region, fallback paths, graceful degradation), conduct per-vendor dependency analysis to surface concentration risk, and document the per-team vendor postmortem policy in the engineering handbook.
- Per-vendor postmortem. Per-vendor incident team postmortem; the document captures the team’s response.
- Team response analysis. Per-vendor team response evaluation; what worked, what did not, what to change for next time.
- Action items for resilience. Per-vendor resilience improvements; multi-region, fallback paths, graceful degradation.
- Per-vendor dependency analysis plus documented policy. Per-vendor dependency review for concentration risk; per-team vendor postmortem policy committed for operational review.
Why this compounds
Vendor postmortem discipline compounds across years. Each postmortem produces resilience improvements that reduce future blast radius from the same vendor; each dependency analysis surfaces concentration risk before it becomes the next incident; the team builds vocabulary for vendor-resilience that pays off on every new vendor evaluation.
- Resilience. Action items reduce vendor blast radius; the next incident from the same vendor produces less customer impact.
- Team learning. Per-vendor postmortem teaches response patterns; the team learns where its detection and recovery muscles are weak.
- Operational culture. Vendor postmortems signal that team response matters regardless of upstream cause; ownership of customer impact stays with the team.
- Institutional knowledge. Each postmortem teaches vendor patterns; the team learns which vendors are reliable and which need fallback strategies.
Vendor postmortem discipline is an operational discipline that pays off across years. Nova AI Ops integrates with vendor telemetry, surfaces dependency patterns, and supports the team’s incident management discipline.