Python Performance: When CPython Is Enough
Python is ‘slow’ in benchmarks; rarely the bottleneck in production. Optimization order matters.
Where Python is slow
Python: slow per CPU instruction; fast per developer hour.
Most Python production bottlenecks are I/O, DB, or algorithm choice.
Four-tier optimization
- Tier 1: profile + algorithm fix.
- Tier 2: pure-Python optimization.
- Tier 3: NumPy / pandas / numba.
- Tier 4: Cython / Rust extension.
When Cython/Rust
Cython: when hot loop is unavoidable; 10-100x speedup.
Rust extension (PyO3): cleaner; modern; growing.
Both: only after profiling proves it’s worth it.
PyPy niche
PyPy: JIT-compiled CPython-compatible. Faster on CPU-bound; not faster on I/O-bound.
Niche: scientific computing; long-running services. Most production Python sticks to CPython.
Antipatterns
- Cython before profiling. Optimizing wrong code.
- PyPy without testing all dependencies. Some C extensions break.
- Rewriting to Go for ‘speed.’ Often the bottleneck is not Python.
What to do this week
Three moves. (1) Apply this pattern to your slowest production endpoint. (2) Measure p99 before/after. (3) Document the win and ship the runbook so the team can reproduce.