Network Throughput Debugging
iperf, netstat.
Overview
Throughput debugging is about measuring actual achievable bandwidth and isolating bottlenecks layer by layer. The four tools below cover application-level synthetic load, TCP-level state, NIC settings, and wire-level packet inspection.
- iperf3. Synthetic throughput tests between two hosts. The first number to pull when an application reports slow transfers.
- ss -i. TCP internal info: congestion window, round-trip time, retransmits. Catches stack-level issues iperf alone cannot see.
- ethtool. NIC settings: link speed, duplex, ring buffers, offload state. Catches hardware-level mismatches that look like software problems.
- tcpdump and cloud limits. Wire-level packet inspection plus per-instance-type bandwidth caps documented by the cloud provider. Easy to forget the cloud cap is the actual ceiling.
The approach
The investigation order matters. Start with iperf to anchor the measurement, drop down to TCP state, then NIC, then the wire. Skipping a layer is how investigations chase the wrong cause for an hour.
- iperf3 baseline. Measure achievable bandwidth between the two hosts in question. Without this number, every other measurement is uncalibrated.
- ss -i for TCP state. Congestion window, RTT, retransmits. Stack tuning lives at this layer.
- ethtool for NIC. Link speed and duplex mismatches. A 100Mbps half-duplex NIC on a 10Gbps cable explains a lot of weird performance.
- tcpdump and cloud caps. Wire-level packet loss and the per-instance-type bandwidth ceiling. Often the answer is “this instance type tops out at 5Gbps.”
Why this compounds
Throughput-debugging fluency is one of the highest-leverage SRE skills because the same toolbox covers TCP, NIC, and cloud-cap investigations. Each session teaches the team a little more about how the network actually behaves.
- Faster MTTR on network issues. Fluent tooling shaves real time off network-perf investigations.
- Better network mental model. Each tool teaches a layer; over months the team has an end-to-end view.
- Right-sized capacity. Knowing the real achievable bandwidth keeps capacity plans honest.
- Year-one investment, year-two habit. The first investigations are slow. By year two the diagnostic order is reflexive.