GPU Economics: H100 vs H200 vs MI300
The hardware decision drives every other ML cost line. Here is what each chip actually delivers, where AMD’s MI300 fits, and the cost-per-token math you should run.
H100
NVIDIA’s 2022-era flagship, still the workhorse of cloud AI. 80GB HBM3, 3 TB/s memory bandwidth, 2 PFLOPS at FP8. Mature software stack (CUDA, TensorRT). Available everywhere; spot pricing $2-4/hr in 2026.
H200
2024 refresh. Same compute as H100, but 141GB HBM3e and 4.8 TB/s bandwidth. Major win for memory-bound inference (long-context LLMs). 30-50% throughput uplift on inference, near-equal training. Spot $4-6/hr.
AMD MI300X
192GB HBM3, 5.3 TB/s bandwidth. Compute roughly comparable to H100 in FP8. Software stack (ROCm, vLLM-AMD) has matured to production-grade in 2025. Often $1.5-3/hr, the cost leader on per-hour pricing.
Catch: model coverage and operational tooling still trail NVIDIA. Top-of-stack optimisations (FA2, speculative decoding) work but lag in releases.
Blackwell (B100, B200)
NVIDIA’s 2025 generation. Massive memory (192GB+), 8 PFLOPS FP4. Specifically designed for inference of trillion-parameter models. Supply-constrained well into 2026; expect $8-12/hr when available.
Cost-per-token math
The decision should be cost per million tokens of useful throughput, not $/hr. For inference workloads:
- H100: ~$0.30-0.50 per million tokens for 70B-class models.
- H200: ~$0.20-0.35 (memory bandwidth wins on inference).
- MI300X: ~$0.15-0.30 (cheaper hourly + larger memory).
For most teams, MI300X is the cost-optimal choice in 2026, with H200 as the safe NVIDIA fallback. H100 remains relevant for legacy compatibility.