tcpdump & strace Cheat Sheet
When the dashboards are green and the service is still broken, you reach for these two. Network packets, syscalls, and the six flag combos that catch 90% of the weird stuff.
tcpdump, by host and port
The host filter is the first one you reach for. Pick a peer, pick a port, capture only that conversation. Anything broader and you're scrolling instead of debugging.
tcpdump -i any host 10.0.5.12, all traffic to/from one IP, any interface.tcpdump -i eth0 src 10.0.5.12, only packets from that host.dstfor the other direction.tcpdump -i any port 5432, everything on Postgres' port. Pair withhostto scope further.tcpdump -i any portrange 8000-8100, whole range when you don't know the exact port.tcpdump -i any host db01 and not port 22, exclude SSH so your own session doesn't pollute the trace.tcpdump -i any 'host db01 and (port 5432 or port 6432)', parens + quotes for compound filters.
tcpdump, protocols and flags
Once you've narrowed by endpoint, narrow by what's wrong. SYN floods, RST storms, ICMP unreachables, each has a one-line filter.
tcpdump -i any 'tcp[tcpflags] & tcp-syn != 0', just the SYNs. Catches inbound connection attempts.tcpdump -i any 'tcp[tcpflags] & (tcp-rst|tcp-fin) != 0', teardowns and resets. RST surge usually means something killed sockets.tcpdump -i any icmp, ICMP only.icmp[icmptype] == 3for "destination unreachable".tcpdump -i any udp port 53, DNS. Add-Ato read the queries.tcpdump -i any 'tcp port 443 and (tcp[((tcp[12:1] & 0xf0) >> 2):1] = 0x16)', TLS handshakes only. Useful when you want to see who's negotiating, not the bulk traffic.tcpdump -i any vlan, tagged frames. Surprisingly handy on shared hosts.
tcpdump, six flag combos that earn their keep
tcpdump -i any -nn -tttt port 80, numeric IPs, numeric ports, full timestamps. The default for any capture you'll read later.tcpdump -i any -A port 80, print payload as ASCII. Read HTTP headers without Wireshark.tcpdump -i any -X port 5432, hex + ASCII side by side. For binary protocols.tcpdump -i any -s0 -w trace.pcap host 10.0.5.12, full packet length, write to file. Open in Wireshark on your laptop.tcpdump -i any -c 100 port 443, stop after 100 packets. Saves you from accidentally capturing 4 GB.tcpdump -i any -G 60 -W 10 -w trace-%H%M%S.pcap host db01, rotate every 60s, keep 10 files. Set it up before a flaky deploy and walk away.
strace, attach and trace
Two modes. Either you launch the process under strace, or you attach to a running PID. Attach is the one you'll use 95% of the time during incidents.
strace -p 4521, attach to PID. Detach with Ctrl-C; the process keeps running.strace -p 4521 -f, follow forks/threads. Almost always what you want.strace -p 4521 -ff -o trace, one file per thread (trace.4521,trace.4522...). Easier to grep.strace -p $(pgrep -d, -f myservice), attach to all matching PIDs at once.strace ./binary --flag, launch under strace from the start. Catches startup syscalls.strace -e trace=network curl https://api.example.com, only network syscalls.file,process,signal,ipcare the other handy groups.
strace, finding slow syscalls
Where strace earns its keep is on the latency side. Two flags: -T for per-syscall duration, -c for the summary at the end.
strace -p 4521 -T 2>&1 | awk '/<[0-9.]+>$/ && $NF+0 > 0.1', print only syscalls slower than 100ms.strace -p 4521 -c, sample for a while, Ctrl-C, get a summary table sorted by time. Point straight to the bottleneck syscall.strace -p 4521 -e trace=read,write -T, just I/O, with timings. Reveals slow disk vs slow network without ambiguity.strace -p 4521 -e trace=futex -c, lots of futex time means lock contention. Counts > 50% of wall time = check threading model.strace -p 4521 -y, print path/socket alongside fd.read(7)is unreadable;read(7<TCP:[10.0.5.12:5432]>)tells the story.
strace, six flag combos that earn their keep
strace -p PID -f -e trace=network -y, what's the process talking to? Endpoint shown inline.strace -p PID -f -e trace=openat,stat -y, "why can't it find this file?" classic.strace -p PID -f -c, the 30-second profile. Run, wait, Ctrl-C, read the summary.strace -p PID -f -e trace=connect,sendto,recvfrom -T, the latency-of-network-calls view.strace -p PID -f -s 256, show first 256 bytes of strings (default is 32). Now you can see SQL queries and HTTP bodies.strace -p PID -f -e signal=all, what signals is the process getting? SIGTERM right before a crash answers a lot.