Databases Intermediate By Samson Tanimawo, PhD Published Dec 17, 2026 9 min read

Database Connection Pool Tuning: The Three Numbers That Matter

Most "database is slow" incidents are pool exhaustion incidents in disguise. Three knobs decide whether your service breaks gracefully under load or stalls silently.

Pool exhaustion looks like a database problem

Symptoms: queries slow; some requests time out; the database itself looks idle. Logs show "could not acquire connection in N ms." On-call concludes "the database is overloaded" and starts looking at slow queries. Hours later, someone notices the pool size is 10 and the service handles 200 concurrent requests.

The pool is the choke point, not the database. The database can handle the load; the application cannot get a connection to ask. A correctly tuned pool prevents the misdiagnosis.

Pool size: the math

The naive answer: bigger is better. The right answer: pool size = (number of concurrent queries the service must hold open at peak) × (safety factor of 1.2). Bigger than that and you starve other services using the same database; smaller and you queue.

The Postgres ceiling. A typical Postgres instance handles 100-300 concurrent connections well; beyond that, performance falls off because Postgres uses one process per connection. If your sum-of-pools across all services exceeds 200, introduce a connection pooler (PgBouncer, Odyssey) between the apps and the database.

The per-pod calculation. With Kubernetes, pool size is per-pod. 10 pods × pool size of 10 = 100 connections. Plan accordingly when scaling out.

Idle timeout: the silent waste

An idle connection still consumes server-side memory. Default idle timeouts (often 10 minutes) waste capacity. Set to 60 seconds for most services; the cost of re-establishing a connection is microseconds, the savings on connection slot usage is massive.

The exception. If your TLS handshake to the database is slow (managed databases often add 50-200ms), longer idle timeouts make sense. Measure the handshake before tuning.

Acquire timeout: the loud failure

The most-misconfigured of the three. Default is often "wait forever", meaning under pool exhaustion your service hangs instead of failing fast. Set to 3-5 seconds. The pod returns 503; the load balancer routes around it; the user gets a fast error instead of a 30-second hang.

The instrumentation. Track acquire-wait time as a histogram. If p99 wait is above 50ms regularly, the pool is too small. If acquire timeouts fire ever, you have hit pool exhaustion.

Antipatterns

Maxing out the pool to be safe. Wastes database capacity, starves other services. Tune to the actual concurrency.

No metrics on pool state. Pool used / available / wait queue should be on every service dashboard. Most teams discover the pool is the problem during the incident, not before.

Same pool config for every service. A read-heavy service has very different concurrency than a batch processor. Tune per service.

What to do this week

Three moves. (1) Add pool-used and acquire-wait metrics to your top-3 services. (2) Set acquire timeout to 3s on each (you will catch latent pool exhaustion that has been hiding). (3) Review your sum-of-pools across services pointing at the same database; if approaching 200, plan a connection pooler.