Postmortem on Vendor Incidents

Even when not yours.

Overview

Postmortems on vendor incidents capture the team’s response and dependency analysis even when the root cause is upstream. The instinct to skip the postmortem because "it was their fault" misses the point: the team’s response to the vendor incident is what determined customer impact, and the dependency analysis informs whether the team should reduce vendor blast radius before the next time. Vendor postmortems produce resilience improvements that are entirely under the team’s control.

The approach

The practical approach is to write a postmortem for every vendor incident that affected customers, focus the analysis on team response (detection time, comm quality, recovery actions) rather than upstream blame, generate resilience action items that reduce future vendor blast radius (multi-region, fallback paths, graceful degradation), conduct per-vendor dependency analysis to surface concentration risk, and document the per-team vendor postmortem policy in the engineering handbook.

Why this compounds

Vendor postmortem discipline compounds across years. Each postmortem produces resilience improvements that reduce future blast radius from the same vendor; each dependency analysis surfaces concentration risk before it becomes the next incident; the team builds vocabulary for vendor-resilience that pays off on every new vendor evaluation.

Vendor postmortem discipline is an operational discipline that pays off across years. Nova AI Ops integrates with vendor telemetry, surfaces dependency patterns, and supports the team’s incident management discipline.