CPU Bottleneck Diagnosis with Flame Graphs
Flame graphs are the single most useful CPU debugging tool. Reading them is teachable; capturing them is one command.
Why flame graphs
Flame graphs visualize CPU time per call stack. Wide bars = CPU hotspots; tall bars = call depth.
Patterns are visual; skim and find.
Four-step workflow
- 1. Capture profile during representative load.
- 2. Generate flame graph.
- 3. Identify widest bars at top of stack.
- 4. Optimize and re-capture.
Language tools
Linux perf + flamegraph.pl: cross-language; native overhead.
async-profiler: JVM, low overhead.
go test -cpuprofile + go tool pprof: native Go.
py-spy: Python sampling.
False-positive checks
Wide system call: not the app’s fault; check the kernel.
Wide GC frame: GC pressure; tune memory.
Wide framework call: maybe expected; verify against baseline.
Antipatterns
- One-shot capture without warmup. Misleading.
- Profile in dev with no load. Wrong picture.
- Optimize without re-profiling. Maybe made it worse.
What to do this week
Three moves. (1) Apply this pattern to your slowest production endpoint. (2) Measure p99 before/after. (3) Document the win and ship the runbook so the team can reproduce.