Data Platform Cost Optimization
Snowflake/BigQuery/Databricks cost.
Compute cost dimensions
Per-query cost (BigQuery, Athena): pay per scan. Cheap when bytes scanned are small; expensive on large scans.
Per-cluster cost (Snowflake, Databricks): pay for warehouse runtime. Cheap when warehouse is right-sized; expensive when oversized or always-on.
Auto-scaling and auto-suspend reduce cost dramatically. Default to suspend after 5-10 min idle.
Storage cost dimensions
Object storage (S3, GCS, Azure Blob): cheap and predictable. Pennies per GB-month.
Snapshot storage. Snapshots accumulate; retention policies bound the cost.
Data warehouse internal storage. Compressed; usually cheaper than raw object storage at scale.
Optimisation patterns
Query optimisation. Partition pruning, columnar reads, materialised views. 5-10x cost reductions common.
Workload separation. Heavy ETL on dedicated warehouses; ad-hoc queries on smaller ones.
Right-size warehouse. Bigger isn't always faster; the math depends on query shape.
Monitoring data platform cost
Per-team chargeback. Tag queries and warehouses; allocate cost.
Per-query cost visibility for engineers. Slack notification on expensive queries.
Quarterly cost review. Top consumers identified; optimisation targeted.