Noisy Neighbor Alerts

Multi-tenant systems: one tenant impacts others. Alert on it.

What a noisy neighbor is

Noisy neighbors are workloads that share infrastructure: one workload spikes, the others see latency, errors, or throttling. Common in Kubernetes (CPU and memory contention), shared databases (lock contention), and shared network (bandwidth). Noisy-neighbor alerts catch the contention before users see it.

What to alert on

The signals are well-understood. CPU throttling, memory pressure, database lock waits, connection pool exhaustion, network packet drops; each maps to a specific contention class, and each has a threshold that catches the contention before users do.

Attributing the noise

An alert that says "a pod is throttled" is unhelpful. The on-call needs "pod X on node Y is throttling pod Z." Attribution is built from cgroup metrics plus node-level metrics, and the top-N consumers per node are surfaced in the alert payload so the on-call has a starting point in 30 seconds.

Common remediations

The remediations are well-understood: resource requests and limits, workload class separation, QoS classes. Each addresses a different aspect of contention, and the discipline is to apply them in sequence rather than as one-off responses to incidents.

When to invest

Noisy-neighbor work is a real investment, and not every cluster needs it. Multi-tenant clusters and shared databases need attribution; single-tenant nodes do not. Above 50 services on shared infrastructure, attribution becomes load-bearing and the investment pays back fast.