AI & ML Advanced By Samson Tanimawo, PhD Published Nov 3, 2026 4 min read

AI Hardware: Custom ASICs

Beyond NVIDIA: Cerebras, Groq, Tenstorrent, and dedicated inference accelerators. The hardware diversity is real and the cost economics matter.

Cerebras

Wafer-scale chips with massive on-die memory. Inference: 1000+ tok/s on Llama 70B. Particularly strong on workloads that fit in their memory hierarchy. Pricing competitive with H100 on $/token.

Groq

Custom LPU (Language Processing Unit). Deterministic, extremely low latency. Inference-only. Sub-100ms first-token on most LLM sizes. Niche but strong for latency-critical apps.

Tenstorrent

RISC-V-based, open architecture, software stack improving. The “cost-leader” play. Adoption growing in cloud providers looking for NVIDIA alternatives.

AWS Inferentia / Trainium

Amazon’s in-house chips. Trainium for training, Inferentia for inference. Strong cost-per-token if you’re committed to AWS. Software (Neuron SDK) is decent.

Where each wins

Latency-critical, high-volume inference: Groq or Cerebras.
Cost-sensitive AWS workloads: Inferentia.
Mainstream training and inference: still NVIDIA H200/B200, with AMD MI300 closing.