Availability Zone Isolation Test

AZ failures are tested by chaos engineering. The test scenario, the metrics to watch, and the bugs it has caught.

The scenario

The availability zone isolation test is the chaos engineering exercise that validates whether a multi-AZ deployment actually survives the loss of an AZ. Most teams claim multi-AZ readiness; few have tested it. The test is the discipline that converts the claim into demonstrated fact.

What the scenario looks like:

The scenario is straightforward in theory; the execution requires preparation. The team's investment in the test produces high-value validation.

Metrics

The test is judged by metrics. Specific success criteria are defined in advance; the test is a pass or fail against those criteria. Without explicit metrics, the test is an opinion exercise.

The metrics are what convert the test from a feeling-based exercise to a data-based validation. Without them, the test produces hand-waving conclusions.

Bugs found

The test almost always finds bugs. Each test produces specific findings; the value of the test is in the bugs found and fixed before they cause real outages.

Availability zone isolation test is the chaos engineering exercise that distinguishes claimed multi-AZ from actual multi-AZ. Nova AI Ops integrates with chaos engineering tools, runs scheduled AZ isolation tests, and produces the per-test report that drives architectural improvement over time.