SRE Best Practices Practical By Samson Tanimawo, PhD Published Jul 6, 2026 4 min read

The Soak Test That Catches Memory Leaks

Most leaks ship to production because soak tests are too short. The 72-hour test, the metrics to watch, and the leaks it has actually caught.

Why 72 hours

Many leaks have a slow time constant. A 1-hour test passes; an 8-hour test catches some; a 72-hour test catches most.

Beyond 72 hours, returns diminish. The 72-hour bar catches the leaks that matter without burning weeks.

What to watch

RSS memory: should asymptote, not grow linearly. Linear growth is the classic leak.

Open file descriptors: should stabilise. Growing FDs are leaks too.

Goroutine count (Go) or thread count (other languages). Unbounded growth is a leak.

GC pause time: should be steady. Growing pauses indicate heap pressure that may have a leak underneath.

Make it part of the release

Block release if soak fails. Without enforcement, the test is documentation.

Soak runs against the release candidate, not main. The test is for the artifact going to production.

Cost: 72 hours of compute per release. Cheap relative to the cost of a leak in production.