AI & ML Advanced By Samson Tanimawo, PhD Published Nov 10, 2026 4 min read

Energy and Sustainability in ML

Training a frontier model consumes the energy of a small country’s annual usage. Inference at scale dwarfs training. The footprint matters and is regulated in some jurisdictions.

The math

Training GPT-4-class models: ~100 GWh, comparable to a few thousand US households for a year. One H100 in serving inference: ~6 kW. A 1000-GPU inference cluster: ~6 MW continuous, ~50 GWh/year. Inference dwarfs training when you scale.

Efficiency gains

Per-token energy has dropped 10x in three years through:

Quantisation (4-bit reduces compute proportionally).
Mixture of experts (only some experts active per token).
Speculative decoding (more tokens per GPU pass).
Better hardware (H100 → H200 → B100).

The trajectory: per-token energy keeps falling faster than usage grows for now.

Reporting

EU companies above certain revenue thresholds report Scope 1, 2, 3 emissions including ML compute. CSRD framework. Cloud providers increasingly publish emissions data per region.

What teams should do

Track per-feature compute and emissions.
Pick lower-carbon regions when latency permits.
Use efficiency techniques (quantisation, caching, routing) for both cost and emissions.