File Descriptor Limits
Hit them; fix.
Overview
File descriptor limits recognise that Linux processes have FD limits that affect real workloads. The default 1024 is sized for desktop circa 2010; production server workloads need explicit, matched limits to avoid mysterious connection failures.
- Hit them then fix. Per-process FD limit; the failure mode is "too many open files" at exactly the wrong moment.
- Per-process limit.
ulimit -n; the per-process FD ceiling; default 1024 is rarely enough. - Per-systemd LimitNOFILE. Per-service unit-file limit; the modern Linux pattern; the systemd unit is the source of truth.
- Per-container plus connection-heavy workloads. Per-container runtime limit for K8s; web servers and databases routinely need 100k+ FDs.
The approach
The practical approach: per-service LimitNOFILE in systemd, per-container limit in K8s, monitor FD usage as a first-class metric, generous limits for connection-heavy workloads, documented per-service rationale. The team’s discipline produces predictable processes.
- Per-service LimitNOFILE. Per-service systemd unit setting; the source of truth for the FD limit.
- Per-container limit. Per-container runtime limit in K8s; sets the container-level ceiling.
- Monitor FD usage. Per-process FD count; the metric that warns before "too many open files."
- Generous limits. 100k+ for web servers; the cost is zero, the protection is real.
- Document the limit. Per-service rationale committed to the repo; supports operational reviews.
Why this compounds
FD discipline compounds across services. Each correctly-set limit prevents incidents; the team’s Linux expertise grows; new services ship with appropriate limits.
- Better resilience. Right limit avoids exhaustion; the service does not fail under expected load.
- Better operational fit. Right limit matches workload; the connection-heavy service has the FDs it needs.
- Better incident response. FD usage monitoring catches issues; the alert fires before user-visible failure.
- Institutional knowledge. Each limit teaches Linux; the team’s runtime engineering muscle grows.
FD discipline is an operational discipline that pays off across years. Nova AI Ops integrates with system telemetry, surfaces patterns, and supports the team’s runtime discipline.