Memory Leaks: Finding and Fixing
Memory leaks accumulate silently; investigations are painful. Process below cuts time-to-fix substantially.
Recognizing leaks
Memory leak: memory grows without bound; eventually OOM kills the process.
Distinguishable from cache growth (caches plateau) and load growth (corresponds to traffic).
Four-symptom checklist
- 1. Restart cycle, service restarts every N hours.
- 2. Slow leak in dashboard.
- 3. GC frequency rising.
- 4. Latency degrading slowly over hours.
Heap-dump analysis
JVM: heap dump + Eclipse MAT.
Go: heap profile via pprof.
Python: tracemalloc + objgraph.
Each language has its tool; learn yours.
Safer alternatives
Continuous profiling beats incident-time heap dumps.
Per-request memory accounting (allocation sampling) flags growth in specific code paths.
Memory regressions in CI: fail PR if test process holds >X memory.
Antipatterns
- Diagnosing in production with stop-the-world dumps. Outage.
- Restarting as the ‘fix.’ Treats symptom.
- Memory leaks blamed on framework. Usually app code.
What to do this week
Three moves. (1) Apply this pattern to your slowest production endpoint. (2) Measure p99 before/after. (3) Document the win and ship the runbook so the team can reproduce.