Buying AI Platform
Buyer's guide.
The buying question
AI platform buying is the discipline of separating the four sub-categories. Most buyers need one or two; few need all four. Conflating them inflates the contract and slows the rollout.
- Four sub-categories. Serving, training, prompt management, and orchestration per platform. Few buyers need all four; pick the layer that maps to the actual problem.
- Default to managed. Bedrock, Vertex AI, or Azure OpenAI for serving per need; LangSmith or Helicone for prompt management. Self-host only when compliance or cost genuinely requires it.
- Twelve-month checkpoint. Volatile-pricing protection per deal. Long contracts lock in old prices on capabilities that will be commodity by year three.
- Documented driver. Named use case per decision. Catches solution-shopping when the AI vendor pitch is more compelling than the actual gap.
Evaluate by use case
Evaluation runs by use case. Inference, fine-tuning, and agentic workloads each have different vendor maps; one platform rarely covers all three well.
- Inference at scale. Latency, throughput, model availability, and fallback per platform. Bedrock and Vertex are within twenty percent of each other on most metrics; pick by the rest of the cloud stack.
- Fine-tuning. Training infra, dataset management, and evaluation tooling per platform. Specialised vendors (Together AI, Modal) outpace hyperscalers on training-focused workflows.
- Agentic workloads. Orchestration (LangGraph, Claude Agent SDK), tool registries, and observability per platform. Newer market; vendor lock-in risk is real because the abstractions are not yet standard.
- Proof-of-value. Thirty to ninety day POC per platform against representative workloads. Catches capability gaps before purchase rather than after the migration starts.
Hidden costs
The hidden costs add up fast. Tokens, egress, and observability each compound quietly during rollout and rarely fit the original ROI model.
- Per-token pricing dominates. One-billion-token math per workload. At three dollars per million: three thousand a month; at fifteen dollars per million: fifteen thousand.
- Egress fees. Hyperscaler-egress charge per call. Calling a hyperscaler model from outside its cloud adds five to ten percent in data transfer.
- Observability and prompt eval. LangSmith, Helicone, or homegrown layer per platform. Adds five to fifteen percent on top of the model spend.
- Eval-cost line. Eval runs and grading cost per platform. Honest TCO includes the cost of the evaluations that keep prompts working.
Contract terms to negotiate
The contract terms matter. Rate cards, compliance attestations, and SLAs each protect against the volatility that makes the AI market different from traditional SaaS.
- Token rate cards. Annual review clause per deal. Avoid three-year locks at current rates; the market drops thirty to fifty percent per year on equivalent capability.
- Compliance attestations. SOC 2 Type II, GDPR DPA, and data-handling addendum per vendor. Any vendor selling without these is too small for production workloads.
- Model availability SLAs. Ninety-nine point five percent minimum per region. "Best effort" is not enough when the model is on the critical path.
- Model-deprecation clause. Deprecation-notice window per deal. Catches breaking model retirements that would otherwise force rushed migrations.
Pick by maturity
The decision runs by maturity. Early experimentation, scaled production, and compliance-bound deployments each point to a different mix; matching maturity to vendor avoids both over-buying and under-investing.
- Early experiments. Pay-as-you-go hyperscaler pick per team. Bedrock or Vertex; commit nothing until the workload is real.
- Scaled production. Negotiated hyperscaler rate plus specialist for fine-tuning per team. Volume earns the rate; specialisation handles the workloads the hyperscaler does poorly.
- Compliance-heavy or sovereign cloud. Self-hosted vLLM or NVIDIA Triton with a twelve-month plan minimum per team. The operating burden is real; budget honestly.
- Maturity-trigger plan. Documented "next-tier-when" criteria per team. Catches premature scaling when the workload is not yet at the threshold the next tier rewards.