Go Runtime Tuning and Profiling
Go is famous for working out of the box. At scale, knowing where the levers are pays back.
Why Go default works
Go ships with sane defaults for nearly every workload. Knowing why they work tells you when they will not.
- Concurrent GC. Tuned for sub-millisecond pauses; rarely the bottleneck even under heavy allocation.
- Goroutine scheduler. Cooperative, work-stealing; scales linearly across cores up to GOMAXPROCS.
- 95% rule. Defaults work for 95% of workloads; do not tune until you have measured.
- When defaults fail. Latency-critical services, very large heaps, container-CPU-pinned environments; the rest of this article covers those.
Four runtime parameters
- GOMAXPROCS, CPU count override.
- GOGC, GC trigger frequency.
- GOMEMLIMIT, soft memory limit (1.19+).
- GODEBUG, runtime tracing flags.
pprof workflow
pprof is the canonical profiler. Five steps from suspicion to verified fix; every Go performance investigation follows the same shape.
- Capture.
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30for CPU;/heapfor memory. - Analyze.
top,web,list <function>; the flame graph reveals the hot path. - Identify. Look for unexpected allocations, lock contention, or surprising entries near the top.
- Fix and verify. Land the fix; re-capture; confirm the hot path moved or shrank; do not assume.
Production profiling
Always-on production profiling beats panicked-capture during incidents. The trend across weeks reveals regressions before they become incidents.
- Tools. Pyroscope, Polar Signals, Datadog Continuous Profiler; all run with overhead under 1%.
- Always-on. Captured continuously; available for any past time window without re-capturing.
- Trend view. Compare this week's profile against last month's; allocation regressions surface before SLO breach.
- Incident value. Profile from the moment of the incident is already in the system; no scrambling to reproduce.
Antipatterns
- GOMAXPROCS not set in containers. Go sees host CPU count; thrashing.
- No production profiling. Diagnose blind.
- GC tuning by guess. Defaults are usually right.
What to do this week
Three moves. (1) Apply this pattern to your slowest production endpoint. (2) Measure p99 before/after. (3) Document the win and ship the runbook so the team can reproduce.