Buying AI Platform
Buyer's guide.
The buying question
AI platforms span model serving, training infrastructure, prompt management, and orchestration. Few buyers need all four; most need one or two.
Default to managed services for serving (Bedrock, Vertex AI, Azure OpenAI) and prompt management (LangSmith, Helicone). Self-host only for compliance or cost reasons.
Buy once with a 12-month checkpoint. AI platform pricing is volatile; long contracts lock in 2024 prices on 2026 capabilities.
Evaluate by use case
Inference at scale: latency, throughput, model availability, fallback. Bedrock and Vertex are within 20% of each other on most metrics.
Fine-tuning: training infra, dataset management, evaluation tooling. Specialised vendors (Together AI, Modal) outpace hyperscalers here.
Agentic workloads: orchestration (LangGraph, Anthropic Agent SDK), tool registries, observability. Newer market; vendor lock-in risk is real.
Hidden costs
Per-token pricing dominates. A 1B token/month workload at $3/1M tokens is $3k/month; at $15/1M tokens it's $15k.
Egress fees on hyperscalers. Calling Bedrock from outside AWS adds 5 to 10% in data transfer.
Observability and prompt eval tooling. LangSmith, Helicone, or homegrown all add 5 to 15% on top of model spend.
Contract terms to negotiate
Token rate cards with annual review. Avoid 3-year locks at current rates; the market drops 30 to 50% per year.
SOC 2 Type II, GDPR DPA, data-handling addendum. Anyone selling without these is too small for production.
Model availability SLAs. "Best effort" is not enough; demand 99.5% per region.
Pick by maturity
Early experiments: pay-as-you-go on a hyperscaler. Bedrock or Vertex.
Scaled production: negotiated rates on the same hyperscaler plus a specialist vendor for fine-tuning.
Compliance-heavy or sovereign-cloud: self-hosted with vLLM or NVIDIA Triton, 12-month plan minimum.