Performance Intermediate By Samson Tanimawo, PhD Published Oct 19, 2026 9 min read

CPU Bottleneck Diagnosis with Flame Graphs

Flame graphs are the single most useful CPU debugging tool. Reading them is teachable; capturing them is one command.

Why flame graphs

Flame graphs visualize CPU time per call stack. Wide bars = CPU hotspots; tall bars = call depth.

Patterns are visual; skim and find.

Four-step workflow

Language tools

Linux perf + flamegraph.pl: cross-language; native overhead.

async-profiler: JVM, low overhead.

go test -cpuprofile + go tool pprof: native Go.

py-spy: Python sampling.

False-positive checks

Wide system call: not the app’s fault; check the kernel.

Wide GC frame: GC pressure; tune memory.

Wide framework call: maybe expected; verify against baseline.

Antipatterns

What to do this week

Three moves. (1) Apply this pattern to your slowest production endpoint. (2) Measure p99 before/after. (3) Document the win and ship the runbook so the team can reproduce.