Memory Leaks: Finding Them
Detection patterns.
Overview
Finding memory leaks is about isolating which allocations grow unbounded, not just restarting the process when RSS gets uncomfortable. The four signals below cover almost every leak shape long-running services produce.
- RSS growth over time. Plot resident set size per process across days. A monotonic upward slope on a steady-state workload is the canonical leak signature.
- Heap profiles. Language-specific tools (pprof, jemalloc, V8 heap snapshots, .NET dotnet-counters) reveal which allocation sites grow.
- Reference tracking. Retention paths show why allocated objects are not freed. Closures, caches, and event-listener registries are the usual culprits.
- Continuous profiling. Per-cluster sampling profilers (Pyroscope, Parca, Datadog Continuous Profiler) catch leaks in production rather than only in load tests.
The approach
The investigation order matters. Watch RSS first, profile when growth is suspected, dig into reference paths only after the profile points to a specific allocation site.
- Monitor RSS growth. Per-process line on the dashboard. Anomalous slope triggers the rest of the investigation.
- Heap profile on suspicion. Take two snapshots an hour apart and diff them. The growing allocations are usually obvious.
- Reference path. Trace the retention path for the largest growing allocation. Most leaks come from a single bad reference.
- Continuous profiling and documented fix. Always-on profiling catches the next leak earlier; each fix is documented so the same shape does not return.
Why this compounds
Leak-hunting fluency compounds because the same toolchain serves every long-running service the team operates. Each leak hunted teaches a little more about the runtime.
- Better stability. Fixed leaks reduce restart frequency and remove a recurring source of low-grade incidents.
- Better resource utilisation. No memory waste means smaller instances or higher density on the same hardware.
- Pattern library. Each leak documented becomes part of the team’s runtime mental model. Future investigations skip steps.
- Year-one investment, year-two habit. The first leak is heavy lift. By the third, the team can hunt one without external help.