Buying Data Warehouse
Buyer's guide.
Overview
Choosing a data warehouse is mostly about pricing model, query latency at your concurrency, and the gravity of the surrounding cloud. Snowflake, BigQuery, Redshift, and Databricks SQL all run analytical SQL well; the differences show up at the bill, the concurrency wall, and the joins to data already in your cloud of choice.
- Pricing axis. Per-credit (Snowflake), per-byte-scanned (BigQuery), reserved-capacity (Redshift), DBU-per-second (Databricks). Same workload prices very differently across them.
- Concurrency model. Auto-scaling warehouses absorb spikes; fixed clusters need careful workload management to avoid queueing.
- Cloud gravity. Data already in S3, GCS, or ADLS reduces ingestion cost and simplifies IAM if the warehouse runs in the same cloud.
- Per-org decision and exit cost. SQL is portable; the materialised views, UDFs, and procedural code that ride on top are not.
The approach
Trial against your real top-10 queries on your real data volume. Vendor benchmarks use TPC-DS; your queries have nested joins and untidy schemas the benchmark does not.
- Top-10 query inventory. List the queries finance and analytics actually run; replay them on each warehouse and time them.
- Concurrency baseline. Confirm the warehouse handles your peak concurrency without queue depth blowing past your SLO.
- Total cost of ownership model. Add storage, compute, query, and seat licences across a 12-month projected volume. Spot prices on slides lie.
- Document the choice and the exit ramp. Capture rationale and how queries and pipelines would migrate if pricing changed.
Why this compounds
The right warehouse keeps paying back: dashboards stay fast as data grows, finance gets a bill model they can forecast, and analytics decisions stop waiting on infrastructure.
- Cost discipline at scale. The right pricing axis for your volume saves more than negotiating discounts later.
- Faster decision cycles. Sub-second dashboards change how often product asks data questions.
- Reduced platform tax. A managed warehouse removes Postgres-tuning hours from the platform team's calendar.
- Decision trail for the next renewal. The trial data becomes the renewal scorecard, not a cold start.