Cloud & Infrastructure
Practical
By Samson Tanimawo, PhD
Published Jun 5, 2026
4 min read
Availability Zone Isolation Test
AZ failures are tested by chaos engineering. The test scenario, the metrics to watch, and the bugs it has caught.
The scenario
Block all traffic to one AZ at the network layer. Workloads in that AZ are unreachable.
Other AZs continue serving.
Metrics
Service latency p99: should not spike beyond N% during the test.
Error rate: should remain at baseline.
Capacity: remaining AZs should absorb the load.
Bugs found
Single-AZ databases that nobody noticed. Quorum issues during the partition.
Sticky DNS records that did not honour health checks.