Performance Test Data Volume

Realistic data sizes.

Overview

Performance test data volume is the discipline of running performance tests against production-shaped data, not against the empty schema or the 100-row fixture. Tests that pass against tiny data sets routinely fail in production because the optimizer chooses different plans, indexes do not get exercised, cardinality assumptions break, and cache hit rates look nothing like real traffic. Production-shape test data is the only way to catch these issues before users do.

Realistic data sizes. Per-test the production-equivalent volume; tiny test data hides regressions that real data exposes.
Per-table row count. Per-table production-equivalent rows; the optimizer picks different plans at different table sizes.
Index hit rates. Per-test realistic index access patterns; tiny data sets fit in cache and never exercise the index.
Cardinality matters plus quarterly refresh. Per-column realistic cardinality (skewed distributions matter); per-quarter test data refresh catches drift.

The approach

The practical approach is to seed performance test environments from anonymized production exports, refresh per quarter against new production shape, preserve cardinality and skew that production exhibits, document the per-test data rationale, and treat the test data infrastructure as production-grade rather than as an afterthought. The discipline is in the data shape; the test framework is secondary.

Production shape. Per-test production-equivalent volume; the optimizer behaves like production rather than like a unit test.
Anonymized data. Per-test anonymized production export; preserves shape without exposing customer data.
Per-quarter refresh. Per-quarter test data refresh; catches drift as production shape evolves.
Per-table cardinality plus documented policy. Per-column realistic cardinality and skew preserved; per-test data rationale committed for operational review.

Why this compounds

Test data discipline compounds across releases. Each performance test against realistic data produces real evidence; each quarterly refresh keeps the test bed honest; the team builds confidence that performance test results predict production behavior. Without the discipline, performance tests pass while production regresses, and the trust in the test framework dies.

Evidence quality. Right data produces real results; the test pass actually means the production performance will hold.
Release safety. Real-shape testing catches real issues; the regression surfaces in CI rather than in production.
Engineering culture. Real data replaces guessing; the team trusts the performance numbers.
Institutional knowledge. Each test teaches application patterns; the team learns which workloads behave differently at scale.

Performance test data discipline is an engineering discipline that pays off across years. Nova AI Ops integrates with performance telemetry, surfaces test patterns, and supports the team’s performance discipline.