strace for Syscall Debugging
strace shows what a process is actually doing.
Attach
strace is the Linux tool for tracing system calls. When a process is doing something mysterious, strace reveals what system calls it makes; the discipline is using strace to investigate behavior at the kernel boundary.
What attaching looks like:
- strace -p PID attaches to a running process.: The -p flag attaches to an existing process by PID. The process continues running; strace observes; the system calls are visible.
- strace -f follows forks.: Multi-process applications fork children. -f follows the forks; child processes are also traced; the full picture is captured.
- strace -ff for separate output.: -ff produces separate files per process. Multi-process traces stay organized; analysis can be per-process.
- Output is verbose.: strace produces lots of output. Most processes make many system calls; the team's filter selects what to focus on.
- Need permission.: Attaching to processes requires permission. Same user, or root, or CAP_SYS_PTRACE; the team's setup must accommodate.
Attaching is the basic operation. The team's investigation starts here.
Filter
The full strace output is overwhelming. Filters narrow the focus to relevant system calls; the analysis becomes targeted.
- strace -e trace=open,read shows just file IO.: The -e trace flag filters to specific syscalls. open and read for file IO; connect for networking; specific syscalls for specific concerns.
- strace -c summarises by syscall.: The -c flag produces a summary table. Time per syscall, count per syscall, error count; the summary identifies what is taking time.
- Filter by category.: trace=file for all file-related syscalls; trace=network for networking; trace=process for fork/exec/etc. The categories make filtering easier.
- Filter by failure.: -e fault=1 filters to system call failures. The team sees only failing syscalls; investigation is targeted.
- Combine filters.: Multiple filters compose. Specific syscalls plus failures; the combination is highly targeted; the output stays manageable.
Filtering is what makes strace usable. Without filters, the output is overwhelming.
Careful
strace has overhead. Significant; the application slows when traced; production use should be brief.
- Slows the process.: strace's overhead is real. Each syscall has additional kernel work for the trace; the application's throughput drops; latency increases.
- Don't leave running.: Long-running strace produces accumulated overhead. The application's behavior is degraded; production performance suffers; the discipline is brief use.
- In production, prefer briefer attaches.: When using strace in production, attach briefly. Capture what is needed; detach; analyze offline; the production impact is bounded.
- Detach properly.: Ctrl+C detaches strace cleanly. The process returns to normal speed; the trace data is captured.
- Consider alternatives.: bpftrace, perf trace, eBPF-based tools are lower overhead alternatives. For continuous tracing or production-sensitive workloads, these may fit better.
strace for syscall debug is one of those Linux operational skills that pays off in mysterious-process investigations. Nova AI Ops integrates with system telemetry, complements local-tool tracing with cluster-wide visibility.