Flame Graphs for Performance
Read flame graphs.
Overview
Flame graphs use stack-trace visualisations to identify hot paths. Profiling output as text is hard to scan; the flame graph turns the same data into a single image where the wide frames are the hot frames. The skill is reading width as time spent.
- Read flame graphs. Per-graph hot path identification; the wide frame at the top of the stack is where time goes.
- Stack-trace visualisation. Per-graph call hierarchy; the bottom shows entry points, the top shows the leaf functions consuming CPU.
- Width equals time. Per-frame width represents time spent; not depth, not order, just time.
- Per-language profiler plus continuous profiling. Per-language flame graph generator (pprof, async-profiler, py-spy); per-cluster always-on profiling makes hot paths surface naturally.
The approach
The practical approach: profile first, generate the flame graph, read by width, run continuous profiling, document each finding. The team’s discipline produces evidence-based optimisation rather than ritual micro-tuning.
- Profile first. Per-process profile before any optimisation; the data points to the actual hot path.
- Flame graph investigation. Per-profile flame graph; the visual makes the hot frame obvious in seconds.
- Width-aware reading. Per-frame width is time spent; ignore depth, focus on the wide frames.
- Continuous profiling plus documented finding. Per-cluster always-on profiling; per-loop rationale committed for operational reviews.
Why this compounds
Flame graph discipline compounds across services. Each profile produces ongoing performance; the team’s runtime expertise grows; new services inherit the profile-first culture.
- Better performance. Right hot path identified; optimisation lands on the frame that actually matters.
- Better cost efficiency. Faster code reduces compute cost; the optimisation pays back in the cloud bill.
- Better engineering culture. Profile-first culture produces evidence-based decisions; speculation gets replaced with width.
- Institutional knowledge. Each profile teaches runtime patterns; the team’s performance engineering muscle grows.
Flame graph discipline is an engineering discipline that pays off across years. Nova AI Ops integrates with profiling telemetry, surfaces patterns, and supports the team’s performance discipline.