tcpdump & strace Cheat Sheet

When the dashboards are green and the service is still broken, you reach for these two. Network packets, syscalls, and the six flag combos that catch 90% of the weird stuff.

tcpdump, by host and port

The host filter is the first one you reach for. Pick a peer, pick a port, capture only that conversation. Anything broader and you're scrolling instead of debugging.

tcpdump -i any host 10.0.5.12, all traffic to/from one IP, any interface.
tcpdump -i eth0 src 10.0.5.12, only packets from that host. dst for the other direction.
tcpdump -i any port 5432, everything on Postgres' port. Pair with host to scope further.
tcpdump -i any portrange 8000-8100, whole range when you don't know the exact port.
tcpdump -i any host db01 and not port 22, exclude SSH so your own session doesn't pollute the trace.
tcpdump -i any 'host db01 and (port 5432 or port 6432)', parens + quotes for compound filters.

tcpdump, protocols and flags

Once you've narrowed by endpoint, narrow by what's wrong. SYN floods, RST storms, ICMP unreachables, each has a one-line filter.

tcpdump -i any 'tcp[tcpflags] & tcp-syn != 0', just the SYNs. Catches inbound connection attempts.
tcpdump -i any 'tcp[tcpflags] & (tcp-rst|tcp-fin) != 0', teardowns and resets. RST surge usually means something killed sockets.
tcpdump -i any icmp, ICMP only. icmp[icmptype] == 3 for "destination unreachable".
tcpdump -i any udp port 53, DNS. Add -A to read the queries.
tcpdump -i any 'tcp port 443 and (tcp[((tcp[12:1] & 0xf0) >> 2):1] = 0x16)', TLS handshakes only. Useful when you want to see who's negotiating, not the bulk traffic.
tcpdump -i any vlan, tagged frames. Surprisingly handy on shared hosts.

tcpdump, six flag combos that earn their keep

tcpdump -i any -nn -tttt port 80, numeric IPs, numeric ports, full timestamps. The default for any capture you'll read later.
tcpdump -i any -A port 80, print payload as ASCII. Read HTTP headers without Wireshark.
tcpdump -i any -X port 5432, hex + ASCII side by side. For binary protocols.
tcpdump -i any -s0 -w trace.pcap host 10.0.5.12, full packet length, write to file. Open in Wireshark on your laptop.
tcpdump -i any -c 100 port 443, stop after 100 packets. Saves you from accidentally capturing 4 GB.
tcpdump -i any -G 60 -W 10 -w trace-%H%M%S.pcap host db01, rotate every 60s, keep 10 files. Set it up before a flaky deploy and walk away.

strace, attach and trace

Two modes. Either you launch the process under strace, or you attach to a running PID. Attach is the one you'll use 95% of the time during incidents.

strace -p 4521, attach to PID. Detach with Ctrl-C; the process keeps running.
strace -p 4521 -f, follow forks/threads. Almost always what you want.
strace -p 4521 -ff -o trace, one file per thread (trace.4521, trace.4522...). Easier to grep.
strace -p $(pgrep -d, -f myservice), attach to all matching PIDs at once.
strace ./binary --flag, launch under strace from the start. Catches startup syscalls.
strace -e trace=network curl https://api.example.com, only network syscalls. file, process, signal, ipc are the other handy groups.

strace, finding slow syscalls

Where strace earns its keep is on the latency side. Two flags: -T for per-syscall duration, -c for the summary at the end.

strace -p 4521 -T 2>&1 | awk '/<[0-9.]+>$/ && $NF+0 > 0.1', print only syscalls slower than 100ms.
strace -p 4521 -c, sample for a while, Ctrl-C, get a summary table sorted by time. Point straight to the bottleneck syscall.
strace -p 4521 -e trace=read,write -T, just I/O, with timings. Reveals slow disk vs slow network without ambiguity.
strace -p 4521 -e trace=futex -c, lots of futex time means lock contention. Counts > 50% of wall time = check threading model.
strace -p 4521 -y, print path/socket alongside fd. read(7) is unreadable; read(7<TCP:[10.0.5.12:5432]>) tells the story.

strace, six flag combos that earn their keep

strace -p PID -f -e trace=network -y, what's the process talking to? Endpoint shown inline.
strace -p PID -f -e trace=openat,stat -y, "why can't it find this file?" classic.
strace -p PID -f -c, the 30-second profile. Run, wait, Ctrl-C, read the summary.
strace -p PID -f -e trace=connect,sendto,recvfrom -T, the latency-of-network-calls view.
strace -p PID -f -s 256, show first 256 bytes of strings (default is 32). Now you can see SQL queries and HTTP bodies.
strace -p PID -f -e signal=all, what signals is the process getting? SIGTERM right before a crash answers a lot.