AI & ML Advanced By Samson Tanimawo, PhD Published Dec 30, 2026 4 min read

Energy and Sustainability in ML

Training a frontier model consumes the energy of a small country’s annual usage. Inference at scale dwarfs training. The footprint matters and is regulated in some jurisdictions.

The math

Training a frontier model emits roughly 100-1000 tons of CO2-equivalent. Inference costs depend on usage but typical large-model serving runs 100-1000 watts per request-minute. Aggregate AI energy use is becoming meaningful for hyperscalers, single-digit percent of grid electricity in some regions, growing.

The training math. Frontier model training: 10K-100K GPU-hours typical. At 700W per H100, 25kWh per training day per GPU. Total training energy: 250 MWh-2.5 GWh. CO2 depends on grid mix; 0.5kg/kWh average gives 125-1250 tons CO2-equivalent.

The inference math. Per-query inference uses tens of watts of GPU time. At billion-query-per-day scales, this becomes substantial. A major LLM API serving a billion daily queries might consume 10-50 GWh per year, comparable to a small city's residential energy.

The "growing fast" reality. AI compute demand grows ~3-5x annually. Energy consumption follows. Hyperscaler datacenters that were 5% of regional grid load are projected to be 15-25% within 5 years in some regions. Grid operators are noticing.

The grid implications. New datacenters strain grids designed for slower load growth. Some regions impose moratoriums on new datacenters. Permitting timelines lengthen. The grid becomes a binding constraint on AI infrastructure expansion in some places.

Efficiency gains

Efficiency improves across multiple axes:

The hardware-generation gain. Each generation is 1.5-3x more energy-efficient at the same workload. Compounding across generations: 2024-2026's frontier hardware is 5-10x more efficient than 2020 hardware. The trend continues.

The quantisation gain. INT4 or INT8 inference uses substantially less energy than FP16. The hardware doesn't have to do as many operations. Combined with hardware support for low-precision (Blackwell's FP4), the savings are dramatic for inference-heavy workloads.

The distillation gain. A distilled model 10x smaller uses ~10x less energy. For volume use cases where the distilled model is "good enough", the energy savings are first-order. Distillation is also one-time cost; benefits permanent.

The caching gain. Cache hits use ~0 energy. For applications with high cache-hit rates (common prompts, repeated queries), caching is the largest single energy lever. Prompt caching plus response caching often achieves 30-70% cache hit rates.

The routing gain. Routing easy queries to small models saves energy proportional to the model size difference. 10x smaller model = 10x less energy on routed queries. Routing is operationally complex but high-value.

Reporting

Regulatory pressure is increasing. EU AI Act requires energy reporting for foundation models. Some jurisdictions mandate carbon disclosure for datacenters above thresholds. ISO/IEC standards are emerging. Companies are starting to publish "training energy used" alongside model releases.

The EU requirement. GPAI providers must report energy consumption during training and average inference. The reporting is part of the GPAI documentation; auditable; downstream developers can use it for their own carbon accounting.

The ISO/IEC standard. ISO/IEC 42001 (AI management systems) includes sustainability provisions. Voluntary; increasingly expected by enterprise customers. Adoption growing through 2026-2027.

The "what to report" reality. Standard reporting includes: training energy in kWh, training emissions in tCO2e (using location-specific grid factors), average inference energy per query, inference emissions per query. Methodologies are converging; not yet fully standardised.

The market pressure. Some enterprise customers ask about AI carbon footprint as part of vendor evaluation. Hyperscalers report aggregate AI energy in sustainability reports. The pressure is real; the response is investment in efficiency.

The greenwashing concern. "AI is green" claims often don't bear scrutiny. Companies that optimise hardware efficiency may grow consumption faster than efficiency improves. The honest framing: efficiency is improving and consumption is growing; both are true.

What teams should do

Practical steps:

The tracking-first principle. You can't optimise what you don't measure. Build dashboards: energy per task type, per feature, per customer. Make the cost of inefficient choices visible.

The easy-wins principle. Quantisation, caching, routing are low-effort and high-impact. Done before harder optimisations; the highest ROI per engineer-hour.

The provider-choice principle. Inference API providers vary in energy efficiency 2-3x. The differences come from hardware choices, datacenter efficiency, model architecture. Picking efficient providers reduces your scope-3 emissions.

The location principle. Cloud regions vary in grid-mix carbon intensity 5-10x. Same workload in a Quebec datacenter vs a Wyoming datacenter has dramatically different emissions. For workloads with location flexibility, the choice matters.

The decommissioning principle. Surveys at large companies often find 20-40% of running AI infrastructure unused or underused. Stale experiments, old A/B tests, idle endpoints. Aggressive decommissioning recovers the energy and the cost.

Common antipatterns

Treating efficiency as separate from cost. They overlap heavily. Optimising for cost optimises for energy.

Greenwashing claims. "AI is sustainable" without numbers. Be honest about consumption growth.

Ignoring location. Grid mix varies dramatically. Pick regions with cleaner grids for flexible workloads.

Skipping decommissioning. Stale infrastructure compounds. Audit periodically.

What to do this week

Three moves. (1) Audit running AI infrastructure for unused/idle systems. The decommissioning often recovers 10-30% of capacity. (2) For one production model, measure energy per query. The number tells you where optimisation matters most. (3) When choosing AI providers, ask for energy-efficiency disclosure. The market signal pushes vendors toward efficiency.