perf Cheatsheet

CPU profiling.

Overview

perf is the Linux kernel's profiling and tracing tool, kernel-resident and present on every modern distribution. Five primitives carry most operational use under load: CPU profiling, hardware-event counters, kernel and userspace tracepoints, flame-graph visualisation, live perf top for in-the-moment investigation. Fluency turns "the system is slow" into a specific function name within minutes.

CPU profiling. perf record samples kernel and userspace stacks at a chosen frequency. Locates the hot code path.
Hardware events. Cache misses, branch mispredictions, instructions per cycle, context switches. Microarchitecture-level investigation when CPU% alone is not enough.
Tracepoints. Kernel and userspace static tracepoints plus dynamic kprobes/uprobes. Event-driven analysis without recompiling.
Flame graphs plus live mode. perf script | flamegraph.pl produces the canonical visualisation; perf top shows live CPU consumers as they run.

The approach

Live first, recorded sample second, flame graph for the report. Five idioms cover most operational use of perf and they are worth committing to muscle memory before the box is on fire.

perf top. Live CPU consumer view. First move for in-the-moment investigation.
perf record -F 99 -g sleep 30. 30-second call-graph profile at 99 Hz. Representative sample without measurement overhead skewing the result.
perf report. Interactive TUI over the recorded data. Drill-down by symbol, by call graph, by CPU.
perf script | flamegraph.pl plus perf stat. Flame graph for shareable output; perf stat reports hardware event counts for microarchitecture investigation.

Why this compounds

Each profile teaches the team's performance model. Hot code paths become legible, optimisation targets become specific, microarchitecture concerns (cache behaviour, branch prediction) move from theoretical to measurable. By year two, perf is the first tool reached for any "is the box ok" investigation.

Faster performance investigation. Fluent perf produces fast root cause. MTTR on compute-side incidents drops.
Better optimisation targets. Profiling shows what to optimise; engineering effort goes to the right code.
Microarchitecture insight. Hardware events reveal what the CPU is actually doing. Deep performance work becomes possible.
Year-one investment, year-two habit. First year builds fluency; by year two, perf is the default first tool on any troubled box.