Cloud & Infrastructure Practical By Samson Tanimawo, PhD Published Jun 5, 2026 4 min read

Availability Zone Isolation Test

AZ failures are tested by chaos engineering. The test scenario, the metrics to watch, and the bugs it has caught.

The scenario

Block all traffic to one AZ at the network layer. Workloads in that AZ are unreachable.

Other AZs continue serving.

Metrics

Service latency p99: should not spike beyond N% during the test.

Error rate: should remain at baseline.

Capacity: remaining AZs should absorb the load.

Bugs found

Single-AZ databases that nobody noticed. Quorum issues during the partition.

Sticky DNS records that did not honour health checks.