perf for CPU Profiling
perf identifies hot functions.
Record
perf is the Linux kernel's built-in profiling tool. Engineers debugging CPU-intensive issues reach for perf; the tool produces detailed CPU profile data; the discipline is fluency with its options.
What recording looks like:
- perf record -p PID -g -- sleep 30 captures 30 seconds.: The command samples the specified process for 30 seconds. The samples include call stacks; the data is written to perf.data; analysis follows.
- -g for call graphs.: The -g flag captures call graph information. Without it, samples show only the leaf function; with it, the full call stack is captured; analysis can attribute CPU time to higher-level callers.
- Sample rate.: Default sampling is at the kernel's preferred frequency (typically 4000Hz). The sample rate balances overhead and resolution; the team can tune if needed.
- Targeted profiling.: -p PID profiles a specific process. -t TID profiles a specific thread. -c CPU profiles a specific CPU. The targeting matches the investigation needs.
- Frame-pointer support.: Some compilers omit frame pointers by default; perf's call graph capture is less reliable without them. The team's binaries should preserve frame pointers for accurate profiling.
Recording is the data capture. The captured data drives the analysis.
Report
perf report analyzes the captured data. The output shows where CPU time was spent; the team's optimization targets are clear.
- perf report shows hot functions.: The report sorts functions by CPU time consumed. Hot functions are at the top; the team's optimization focuses on them.
- Interactive navigation.: perf report's interface is interactive. The team navigates the call tree; drills into specific functions; sees the call graph.
- perf script for raw output.: Beyond perf report, perf script produces text output. The output can be processed by other tools; FlameGraph generation is a common downstream use.
- FlameGraph for visualization.: FlameGraphs visualize CPU profile data. Hot paths are visible at a glance; the visualization is more accessible than text reports; the analysis is supported.
- Compare profiles.: Two profiles can be compared. Before optimization vs after; the team verifies the optimization helped; the data drives decisions.
Reporting is the analysis. The captured data becomes actionable insight.
Careful
perf has overhead. Small but not zero; production use should be brief and targeted; the discipline includes recognizing when to use perf and when to use lighter alternatives.
- Production overhead is small but not zero.: perf's sampling is efficient but not free. The performance impact is bounded; for long-running profiles, the cumulative impact may matter.
- Use during reproducible high-CPU events.: The best profile captures the high-CPU situation. perf during the event produces actionable data; perf during normal load may not.
- Brief captures.: 30-second captures are typical; longer captures produce more data but more overhead. The team's discipline includes using the minimum capture time.
- Don't leave perf running.: Leaving perf running indefinitely is wasteful. The capture is for investigation; turn it off when done; the production system returns to normal.
- Combine with other tools.: perf shows CPU; other tools show I/O, memory, network. The full picture comes from combining; perf is one input.
perf for CPU profiling is one of those Linux operational skills that pays off in performance investigations. Nova AI Ops integrates with system telemetry, surfaces CPU patterns at scale, and complements local-tool investigation.