Python Performance: When CPython Is Enough
Python is ‘slow’ in benchmarks; rarely the bottleneck in production. Optimization order matters.
Where Python is slow
Python's reputation for slowness is mostly wrong. The language is slow per CPU instruction, but most production bottlenecks are not CPU at all.
- Per instruction. CPython is slow compared to compiled languages; benchmarks show this clearly.
- Per developer hour. Python is fast to write, fast to debug; the real economics are mixed.
- Real bottlenecks. Most Python production bottlenecks are I/O, DB queries, or algorithm choice; not CPU.
- Profile first. Always profile before assuming Python is the problem; the data usually says otherwise.
Four-tier optimization
- Tier 1: profile + algorithm fix.
- Tier 2: pure-Python optimization.
- Tier 3: NumPy / pandas / numba.
- Tier 4: Cython / Rust extension.
When Cython/Rust
When profiling proves a hot Python loop is the bottleneck, native extensions are the answer. The order matters: profile, then optimise, then port.
- Cython. When the hot loop is unavoidable; 10x to 100x speedup; mature, battle-tested.
- Rust extension (PyO3). Cleaner ergonomics, growing ecosystem; modern alternative to Cython.
- Profiling gate. Both options only after profiling proves the loop is worth porting.
- Maintenance cost. Native extensions add build complexity; the team must maintain the toolchain.
PyPy niche
PyPy is a JIT-compiled CPython-compatible runtime. It wins on specific workloads; check whether yours is one before switching.
- JIT win. Faster on CPU-bound long-running workloads; the JIT warms up over minutes, not seconds.
- I/O neutral. Not faster on I/O-bound code; the JIT cannot speed up syscalls.
- Niche fit. Scientific computing, long-running services with hot Python loops; outside that, sticks with CPython.
- Compatibility risk. Some C extensions break on PyPy; test the full dependency tree before committing.
Antipatterns
- Cython before profiling. Optimizing wrong code.
- PyPy without testing all dependencies. Some C extensions break.
- Rewriting to Go for ‘speed.’ Often the bottleneck is not Python.
What to do this week
Three moves. (1) Apply this pattern to your slowest production endpoint. (2) Measure p99 before/after. (3) Document the win and ship the runbook so the team can reproduce.